Full Code of yjwong1999/Twin-TD3 for AI

main 1e172dc58daa cached

47 files

180.3 KB

49.4k tokens

181 symbols

1 requests

Download .txt

Repository: yjwong1999/Twin-TD3
Branch: main
Commit: 1e172dc58daa
Files: 47
Total size: 180.3 KB

Directory structure:
gitextract_sya5mzu7/

├── README.md
├── batch_eval.sh
├── batch_train.sh
├── channel.py
├── data/
│   ├── .vscode/
│   │   └── settings.json
│   ├── data_test.py
│   ├── init_location.xlsx
│   ├── readme.md
│   ├── simulation_result.xlsx
│   └── storage/
│       ├── benchmark/
│       │   ├── ddpg_see_benchmark/
│       │   │   ├── Actor_UAV_ddpg
│       │   │   ├── Critic_UAV_ddpg
│       │   │   ├── TargetActor_UAV_ddpg
│       │   │   └── TargetCritic_UAV_ddpg
│       │   ├── ddpg_ssr_benchmark/
│       │   │   ├── Actor_UAV_ddpg
│       │   │   ├── Critic_UAV_ddpg
│       │   │   ├── TargetActor_UAV_ddpg
│       │   │   └── TargetCritic_UAV_ddpg
│       │   ├── td3_see_benchmark/
│       │   │   ├── Actor_UAV_TD3
│       │   │   ├── Critic_1_UAV_TD3
│       │   │   ├── Critic_2_UAV_TD3
│       │   │   ├── TargetActor_UAV_TD3
│       │   │   ├── TargetCritic_1_UAV_TD3
│       │   │   └── TargetCritic_2_UAV_TD3
│       │   └── td3_ssr_benchmark/
│       │       ├── Actor_UAV_TD3
│       │       ├── Critic_1_UAV_TD3
│       │       ├── Critic_2_UAV_TD3
│       │       ├── TargetActor_UAV_TD3
│       │       ├── TargetCritic_1_UAV_TD3
│       │       └── TargetCritic_2_UAV_TD3
│       └── readme.md
├── data_manager.py
├── ddpg.py
├── entity.py
├── env.py
├── env1.py
├── load_and_plot.py
├── main_RIS.py
├── main_ref.py
├── main_train.py
├── math_tool.py
├── plot_see.py
├── plot_ssr.py
├── plot_traj.py
├── render.py
├── requirements.txt
├── run_simulation.py
└── td3.py

================================================
FILE CONTENTS
================================================

================================================
FILE: README.md
================================================
# Deep Reinforcement Learning for Secrecy Energy-Efficient UAV Communication with Reconfigurable Intelligent Surfaces

**IEEE Wireless Communications and Networking Conference 2023 (WCNC 2023)** </br>
Simulation for Conference Proceedings https://doi.org/10.1109/WCNC55385.2023.10118891 </br>
Refer [this link](https://github.com/yjwong1999/Twin-TD3/blob/main/WCNC2023%20WS-09%20%231570879488.pdf) for the preprint

## Abstract
This paper investigates the **physical layer security (PLS)** issue in **reconfigurable intelligent surface (RIS) aided millimeter-wave rotary-wing unmanned aerial vehicle (UAV) communications** under the presence of multiple eavesdroppers and imperfect channel state information (CSI). The goal is to maximize the **worst-case secrecy energy efficiency (SEE)** of UAV via a **joint optimization of flight trajectory, UAV active beamforming and RIS passive beamforming**. By interacting with the dynamically changing UAV environment, real-time decision making per time slot is possible via deep reinforcement learning (DRL). To decouple the continuous optimization variables, we introduce a **twin twin-delayed deep deterministic policy gradient (TTD3)** to maximize the expected cumulative reward, which is linked to SEE enhancement. Simulation results confirm that the proposed method achieves greater secrecy energy savings than the traditional twin-deep deterministic policy gradient DRL (TDDRL)-based method. 

## TLDR

### System model: 
**RIS-aided mmWave UAV system** under the presence of **eavesdroppers** and **imperfect channel state information (CSI)** </br>

### Solution: 
A **Twin-TD3 (TTD3) algorithm** to decouple the joint optimization of:
1. UAV active beamforming and RIS passive beamforming 
2. UAV flight trajectory

We adopt **double DRL framework**, where the 1st and 2nd agent provides the policy for task (1) and (2), respectively.

## How to use this repo

Setup the repo
```
conda create --name <env> python=3.10.4
conda activate <env>
git clone https://github.com/yjwong1999/Twin-TD3.git
cd Twin-TD3
pip install -r requirements.txt
```

User can train two types of algorithm for training:
1. Twin DDPG is [TDDRL algorithm](https://doi.org/10.1109/LWC.2021.3081464)
2. Twin TD3 is our proposed TTD3 algorithm

Run the following  in the `bash` or `powershell`

`main_train.py` is the main python file to train the DRL algorithms
```shell
# To use Twin DDPG with SSR as optimization goal
python3 main_train.py --drl ddpg --reward ssr

# To use Twin TD3 with SSR as optimization goal
python3 main_train.py --drl td3 --reward ssr

# To use Twin DDPG with SEE as optimization goal
python3 main_train.py --drl ddpg --reward see

# To use Twin TD3 with SEE as optimization goal
python3 main_train.py --drl td3 --reward see



# To use pretrained DRL for UAV trajectory (recommended for stable convergence)
python3 main_train.py --drl td3 --reward see --trained-uav

# To set number of episodes (default is 300)
python3 main_train.py --drl td3 --reward see --ep-num 300

# To set seeds for DRL weight initialization (not recommended)
python3 main_train.py --drl td3 --reward see --seeds 0       # weights of both DRL are initialized with seed 0
python3 main_train.py --drl td3 --reward see --seeds 0 1     # weights of DRL 1 and DRL2 are initialized with seed 0 and 1, respectively
```

`run_simulation.py` is the python file to run the simulation using your trained models
```shell
# plot everything for each episode
python3 run_simulation.py --path data/storage/scratch/<DIR>       # if you train the algorithm without the pretrained uav
python3 run_simulation.py --path data/storage/trained_uav/<DIR>   # if you train the algorithm with the pretrained uav
```


`load_and_plot.py` is the python file to plot the (i) Rewards, (ii) Sum Secrecy Rate (SSR), (iii) Secrecy Energy Efficient (SEE), (iv) UAV Trajectory, (v) RIS configs for each episode in one experiments. The plotted figures are saved at `data/storage/scratch/<DIR>/plot` or `data/storage/trained_uav/<DIR>plot`
```shell
# plot everything for each episode
python3 load_and_plot.py --path data/storage/scratch/<DIR> --ep-num 300       # if you train the algorithm without the pretrained uav
python3 load_and_plot.py --path data/storage/trained_uav/<DIR> --ep-num 300   # if you train the algorithm with the pretrained uav
```

Note that you can use the bash script `batch_train.sh` and `batch_eval.sh` to train the algorithms and evaluate them using the previous two python codes
```shell
# To train on batch
bash batch_train.sh

# To evaluate on batch
bash batch_eval.sh
```

## Results

We run the ```main_train.py``` for 5 times for each settings below, and averaged out the performance

SSR and SEE              (the higher the better)
Total Energy Consumption (the lower the better)

| Algorithms                     | SSR (bits/s/Hz)| Energy (kJ) | SEE (bits/s/Hz/kJ)|
|--------------------------------|----------------|-------------|-------------------|
| TDDRL                          | 5.03           | 12.4        | 40.8              |
| TTD3                           | 6.05           | 12.7        | 48.2              |
| TDDRL (with energy constraint) | 4.68           | 11.2        | 39.4              |
| TTD3  (with energy constraint) | 5.39           | 11.2        | 48.4              |

Summary
1. In terms of SSR, TTD3 outperforms TDDRL with or without energy constraint
2. In terms of SEE and Energy, TTD3 (with energy constraint) outperforms all other algorithms
3. Generally, TTD3 algorithm are better than TTDRL
4. Even with energy contraint (trade-off between energy consumption and SSR), TTD3 outperforms TDDRL in all aspects

\* Remarks: </br>
Note that the performance of DRL (especially twin DRL) has a big variation, sometimes you may get extremely good (or bad) performance </br>
The above benchmark results are averaged performance of several experiments, to get a more holistic understandings on the algorithms </br>
It is advised to use the benchmark UAV models we trained, for better convergence. </br>
This approach is consistent with the codes provided by [TDDRL](https://github.com/Brook1711/WCL-pulish-code)

## References and Acknowledgement

This work was supported by the **British Council** under **UK-ASEAN Institutional Links Early Career Researchers Scheme** with project number 913030644.

Both **RIS Simulation** and the **System Model** for this Research Project are based the research work provided by [Brook1711](https://github.com/Brook1711). </br>
We intended to fork the original repo for the system model (as stated below) as the base of this project. </br>
However, GitHub does not allow a forked repo to be private. </br>
Hence, we could not maintain our code based a forked version of the original repo, while keeping it private until the project is completed.
We would like to express our utmost gratitude for [Brook1711](https://github.com/Brook1711) and his co-authors for their research work.

### RIS Simulation
RIS Simulation is based on the following research work: </br>
[SimRIS Channel Simulator for Reconfigurable Intelligent Surface-Empowered Communication Systems](https://ieeexplore.ieee.org/document/9282349) </br>
The original simulation code is coded in matlab, this [GitHub repo](https://github.com/Brook1711/RIS_components) provides a Python version of the simulation.

### System Model: RIS-aided mmWave UAV communications
The simulation of the System Model is provided by the following research work: </br>
[Learning-Based Robust and Secure Transmission for Reconfigurable Intelligent Surface Aided Millimeter Wave UAV Communications](https://doi.org/10.1109/LWC.2021.3081464) </br>
The code is provided in this [GitHub repo](https://github.com/Brook1711/WCL-pulish-code).

### Rotary-Wing UAV
We can derive the Rotary-Wing UAV’s propulsion energy consumption based on the following research work: </br>
[Energy Minimization in Internet-of-Things System Based on Rotary-Wing UAV](https://doi.org/10.1109/LWC.2019.2916549)

### TD3
Main reference for TD3 implementation: </br>
[PyTorch/TensorFlow 2.0 for TD3](https://github.com/philtabor/Actor-Critic-Methods-Paper-To-Code/tree/master/TD3)


## TODO
- [x] Add argparse arguments to set drl algo and reward type
- [x] Add argparse arguments to set episode number
- [x] Add argparse arguments to set seeds for the two DRLs
- [x] Add argparse arguments to load pretrained DRL for UAV trajectory
- [x] Add benchmark/pretrained model
- [x] Project naming (use <DRL>_<Reward>_<Num> instead of using datetime format)
- [x] Remove saving "best model", there are no best model, only latest model
- [ ] The following codes can be used, but you have to manually change the filepath in the codes

`plot_ssr.py` is the python file to plot the final episode's SSR for the 4 benchmarks in the paper
```shell
# plot ssr
python3 plot_ssr.py
```

`plot_see.py` is the python file to plot the final episode's SSR for the 4 benchmarks in the paper
```shell
# plot see
python3 plot_see.py
```

`plot_traj.py` is the python file to plot the final episode's UAV trajectory for the 4 benchmarks in the paper
```shell
# plot UAV trajectory
python3 plot_traj.py
```

## Cite this repository
```
@INPROCEEDINGS{10118891,
  author={Tham, Mau-Luen and Wong, Yi Jie and Iqbal, Amjad and Ramli, Nordin Bin and Zhu, Yongxu and Dagiuklas, Tasos},
  booktitle={2023 IEEE Wireless Communications and Networking Conference (WCNC)}, 
  title={Deep Reinforcement Learning for Secrecy Energy- Efficient UAV Communication with Reconfigurable Intelligent Surface}, 
  year={2023},
  doi={10.1109/WCNC55385.2023.10118891}}
```

<details>
<summary>Star History</summary>
  
[![Star History Chart](https://api.star-history.com/svg?repos=yjwong1999/Twin-TD3&type=Date)](https://star-history.com/#yjwong1999/Twin-TD3&Date)

</details>


================================================
FILE: batch_eval.sh
================================================
#!/bin/bash

echo 

echo ddpg_ssr
python3 load_and_plot.py --path data/storage/trained_uav/ddpg_ssr --ep-num 300
echo ddpg_ssr_2
python3 load_and_plot.py --path data/storage/trained_uav/ddpg_ssr_2 --ep-num 300
echo ddpg_ssr_3
python3 load_and_plot.py --path data/storage/trained_uav/ddpg_ssr_3 --ep-num 300
echo ddpg_ssr_4
python3 load_and_plot.py --path data/storage/trained_uav/ddpg_ssr_4 --ep-num 300
echo ddpg_ssr_5
python3 load_and_plot.py --path data/storage/trained_uav/ddpg_ssr_5 --ep-num 300


echo td3_ssr
python3 load_and_plot.py --path data/storage/trained_uav/td3_ssr --ep-num 300
echo td3_ssr_2
python3 load_and_plot.py --path data/storage/trained_uav/td3_ssr_2 --ep-num 300
echo td3_ssr_3
python3 load_and_plot.py --path data/storage/trained_uav/td3_ssr_3 --ep-num 300
echo td3_ssr_4
python3 load_and_plot.py --path data/storage/trained_uav/td3_ssr_4 --ep-num 300
echo td3_ssr_5
python3 load_and_plot.py --path data/storage/trained_uav/td3_ssr_5 --ep-num 300


echo ddpg_see
python3 load_and_plot.py --path data/storage/trained_uav/ddpg_see --ep-num 300
echo ddpg_see_2
python3 load_and_plot.py --path data/storage/trained_uav/ddpg_see_2 --ep-num 300
echo ddpg_see_3
python3 load_and_plot.py --path data/storage/trained_uav/ddpg_see_3 --ep-num 300
echo ddpg_see_4
python3 load_and_plot.py --path data/storage/trained_uav/ddpg_see_4 --ep-num 300
echo ddpg_see_5
python3 load_and_plot.py --path data/storage/trained_uav/ddpg_see_5 --ep-num 300


echo td3_see
python3 load_and_plot.py --path data/storage/trained_uav/td3_see --ep-num 300
echo td3_see_2
python3 load_and_plot.py --path data/storage/trained_uav/td3_see_2 --ep-num 300
echo td3_see_3
python3 load_and_plot.py --path data/storage/trained_uav/td3_see_3 --ep-num 300
echo td3_see_4
python3 load_and_plot.py --path data/storage/trained_uav/td3_see_4 --ep-num 300
echo td3_see_5
python3 load_and_plot.py --path data/storage/trained_uav/td3_see_5 --ep-num 300




================================================
FILE: batch_train.sh
================================================
#!/bin/bash

echo 

echo ddpg_ssr
python3 main_train.py --drl ddpg --reward ssr --ep-num 300 --trained-uav
echo ddpg_ssr_2
python3 main_train.py --drl ddpg --reward ssr --ep-num 300 --trained-uav
echo ddpg_ssr_3
python3 main_train.py --drl ddpg --reward ssr --ep-num 300 --trained-uav
echo ddpg_ssr_4
python3 main_train.py --drl ddpg --reward ssr --ep-num 300 --trained-uav
echo ddpg_ssr_5
python3 main_train.py --drl ddpg --reward ssr --ep-num 300 --trained-uav

echo td3_ssr
python3 main_train.py --drl td3 --reward ssr --ep-num 300 --trained-uav
echo td3_ssr_2
python3 main_train.py --drl td3 --reward ssr --ep-num 300 --trained-uav
echo td3_ssr_3
python3 main_train.py --drl td3 --reward ssr --ep-num 300 --trained-uav
echo td3_ssr_4
python3 main_train.py --drl td3 --reward ssr --ep-num 300 --trained-uav
echo td3_ssr_5
python3 main_train.py --drl td3 --reward ssr --ep-num 300 --trained-uav

echo ddpg_see
python3 main_train.py --drl ddpg --reward see --ep-num 300 --trained-uav
echo ddpg_see_2
python3 main_train.py --drl ddpg --reward see --ep-num 300 --trained-uav
echo ddpg_see_3
python3 main_train.py --drl ddpg --reward see --ep-num 300 --trained-uav
echo ddpg_see_4
python3 main_train.py --drl ddpg --reward see --ep-num 300 --trained-uav
echo ddpg_see_5
python3 main_train.py --drl ddpg --reward see --ep-num 300 --trained-uav

echo td3_see
python3 main_train.py --drl td3 --reward see --ep-num 300 --trained-uav
echo td3_see_2
python3 main_train.py --drl td3 --reward see --ep-num 300 --trained-uav
echo td3_see_3
python3 main_train.py --drl td3 --reward see --ep-num 300 --trained-uav
echo td3_see_4
python3 main_train.py --drl td3 --reward see --ep-num 300 --trained-uav
echo td3_see_5
python3 main_train.py --drl td3 --reward see --ep-num 300 --trained-uav


================================================
FILE: channel.py
================================================
import numpy as np
import math
import cmath
from math_tool import *

class mmWave_channel(object):
    """
    generate MmWave under UMi open
    input: distance, angle, pair entity object
    output: Instantaneous CSI
    """
    def __init__(self, transmitter, receiver, frequncy):
        """
        transmitter: object in entity.py
        receiver: object in entity.py
        
        """
        self.channel_name = ''
        self.n = 0
        self.sigma = 0
        self.transmitter = transmitter 
        self.receiver = receiver
        self.channel_type = self.init_type()    # 'UAV_RIS', 'UAV_user', 'UAV_attacker', 'RIS_user', 'RIS_attacker'
        
        # self.distance = np.linalg.norm(transmitter.coordinate - receiver.coordinate)
        self.frequncy = frequncy
        # init & updata path loss
        self.path_loss_normal = self.get_channel_path_loss()
        self.path_loss_dB = normal_to_dB(self.path_loss_normal)
        # init & update channel CSI matrix
        self.channel_matrix = self.get_estimated_channel_matrix()
        
    def init_type(self):
        channel_type = self.transmitter.type+'_'+self.receiver.type
        if channel_type == 'UAV_RIS' or channel_type == 'RIS_UAV':
            self.n = 2.2
            self.sigma = 3
            self.channel_name = 'H_UR'
        elif channel_type == 'UAV_user' or channel_type == 'UAV_attacker':
            self.n = 3.5
            self.sigma = 3
            if channel_type =='UAV_user':
                self.channel_name = 'h_U_k,' + str(self.transmitter.index)
            elif channel_type == 'UAV_attacker':
                self.channel_name = 'h_U_p,' + str(self.transmitter.index)
        elif channel_type == 'user_UAV' or channel_type == 'attacker_UAV':
            self.n = 3.5
            self.sigma = 3
            if channel_type =='user_UAV':
                self.channel_name = 'h_U_k,' + str(self.transmitter.index)
            elif channel_type == 'attacker_UAV':
                self.channel_name = 'h_U_p,' + str(self.transmitter.index)
                
        elif channel_type == 'RIS_user' or channel_type == 'RIS_attacker':
            self.n = 2.8
            self.sigma = 3
            if channel_type =='RIS_user':
                self.channel_name = 'h_R_k,' + str(self.transmitter.index)
            elif channel_type == 'RIS_attacker':
                self.channel_name = 'h_R_p,' + str(self.transmitter.index)        
        elif channel_type == 'user_RIS' or channel_type == 'attacker_RIS':
            self.n = 2.8
            self.sigma = 3
            if channel_type =='user_RIS':
                self.channel_name = 'h_R_k,' + str(self.transmitter.index)
            elif channel_type == 'attacker_RIS':
                self.channel_name = 'h_R_p,' + str(self.transmitter.index)  
        return channel_type

    def get_channel_path_loss(self):
        """
        calculate the path loss including shadow fading 
        (in normal form)
        """
        distance = np.linalg.norm(self.transmitter.coordinate - self.receiver.coordinate)
        PL = -20 * math.log10(4*math.pi/(3e8/self.frequncy)) - 10*self.n*math.log10(distance)
        shadow_loss = np.random.normal() * self.sigma
        # return dB_to_normal(PL - shadow_loss)
        return dB_to_normal(PL)

    def get_estimated_channel_matrix(self):
        """
        init & update channel matrix
        """
        # init matrix
        N_t = self.transmitter.ant_num
        N_r = self.receiver.ant_num
        channel_matrix = np.mat(np.ones(shape=(N_r,N_t),dtype=complex), dtype=complex)

        # get relevant coordinate receiver under transmitter system
        r_under_t_car_coor = get_coor_ref(\
        self.transmitter.coor_sys, \
        self.receiver.coordinate - self.transmitter.coordinate)
        # get relevant spherical_coordinate 
        r_t_r, r_t_theta, r_t_fai = cartesian_coordinate_to_spherical_coordinate(\
        cartesian_coordinate=r_under_t_car_coor\
        )

        # get relevant coordinate transmitter under receiver system
        t_under_r_car_coor = get_coor_ref(\
        #   remmber to Meet channel direction restrictions
        [-self.receiver.coor_sys[0], self.receiver.coor_sys[1], -self.receiver.coor_sys[2]],\
        self.transmitter.coordinate - self.receiver.coordinate)
        # get relevant spherical_coordinate 
        t_r_r, t_r_theta, t_r_fai = cartesian_coordinate_to_spherical_coordinate(\
        cartesian_coordinate=t_under_r_car_coor\
        )

        # calculate array response
        t_array_response = self.generate_array_response(self.transmitter, r_t_theta, r_t_fai)
        r_array_response = self.generate_array_response(self.receiver, t_r_theta, t_r_fai)
        array_response_product = r_array_response * t_array_response.H
        # get H_LOS
        #   get LOS path loss 
        PL = self.path_loss_normal
        
        #   get LOS phase shift
        LOS_fai = 2 * math.pi * self.frequncy * np.linalg.norm(self.transmitter.coordinate - self.receiver.coordinate) / 3e8
        channel_matrix = cmath.exp(1j*LOS_fai)* math.pow(PL, 0.5) * array_response_product
        
        return channel_matrix

    def generate_array_response(self, transceiver, theta, fai):
        """
        if the ant_type is 'UPA'
        generate_UPA_response
        if the ant_type is 'ULA'
        generate_ULA_response
        if the ant_type is 'single'
        generate_singleant_response
        """
        ant_type = transceiver.ant_type
        ant_num  = transceiver.ant_num

        if ant_type == 'UPA':
            row_num = int(math.sqrt(ant_num))
            Planar_response = np.mat(np.ones(shape=(ant_num, 1)), dtype=complex)
            for i in range(row_num):
                for j in range(row_num):
                    Planar_response[j+i*row_num,0] = cmath.exp(1j *\
                    (math.sin(theta) * math.cos(fai)*i*math.pi + math.sin(theta)*math.sin(fai))\
                    )
                
            return Planar_response
        elif ant_type == 'ULA':
            Linear_response = np.mat(np.ones(shape=(ant_num,1)), dtype=complex)
            for i in range(ant_num):
                Linear_response[i, 0] = cmath.exp(1j * math.sin(theta) * math.cos(fai)*i*math.pi)
            return Linear_response
        elif ant_type == 'single':
            return np.mat(np.array([1]))
        else:
            return False
        
    def update_CSI(self):
        """
        update pathloss and channel matrix
        """
        # init & updata path loss
        self.path_loss_normal = self.get_channel_path_loss()
        self.path_loss_dB = normal_to_dB(self.path_loss_normal)
        # init & update channel CSI matrix
        self.channel_matrix = self.get_estimated_channel_matrix()


================================================
FILE: data/.vscode/settings.json
================================================
{
    "python.pythonPath": "C:\\Users\\67039\\anaconda3\\python.exe"
}

================================================
FILE: data/data_test.py
================================================
import pandas as pd 
user_sheet = pd.read_excel('init_location.xlsx', sheet_name='user')
user_data = user_sheet

print(user_data)

================================================
FILE: data/readme.md
================================================
#### init_location.xlsx
- the config files for the initilization postion of each entity (such as UAV, users, attackers, RIS)

#### data_test.py (should change name)
- to read and show the initialization position of each entity


================================================
FILE: data/storage/readme.md
================================================
# This directory stores all experiments

- Each experiment will be saved as a file project in this directory
- benchmark directory stores all the pretrained DRL for UAV trajectory (agent 2 in the code), which will be used when --trained-uav flag is raised


================================================
FILE: data_manager.py
================================================
import numpy as np
import scipy.io
import pandas as pd
import os
import time, csv

class DataManager(object):
    """
    class to read and store simulation results
    before use, please create a direction under current file path './data'
    and must have a file 'init_location.xlsx' which contain the position of each entities
    """
    def __init__(self, store_list = ['beamforming_matrix', 'reflecting_coefficient', 'UAV_state', 'user_capacity'],file_path = './data', store_path = './data/storage', project_name = None):
        # 1 init location data
        self.store_list = store_list
        self.init_data_file = file_path + '/init_location.xlsx'
        if project_name is None:
            self.time_stemp = time.strftime('/%Y-%m-%d %H_%M_%S',time.localtime(time.time()))
            self.store_path = store_path + self.time_stemp 
        else:
            for i in range(1, 100, 1):
                if i == 1:
                    dir_name = store_path + '/' + project_name
                else:
                    dir_name = store_path + '/' + project_name + f'_{i}'
                if not os.path.isdir(dir_name):
                    self.store_path = dir_name
                    break
        os.makedirs(self.store_path) 
        # self.writer = pd.ExcelWriter(self.store_path + '/simulation_result.xlsx', engine='openpyxl')  # pylint: disable=abstract-class-instantiated 
        self.simulation_result_dic = {}
        self.init_format()

    def save_file(self, episode_cnt = 10):
        # record step counts per episode
        with open(self.store_path + "/step_num_per_episode.csv", "a", newline='') as f:
            writer = csv.writer(f)
            writer.writerow([len(list(self.simulation_result_dic.values())[0])])

        # when ended, auto save to .mat file
        scipy.io.savemat(self.store_path + '/simulation_result_ep_' + str(episode_cnt) + '.mat', {'result_' + str(episode_cnt):self.simulation_result_dic})
        self.simulation_result_dic = {}
        self.init_format()

    def save_meta_data(self, meta_dic):
        """
        save system and agent information
        """
        scipy.io.savemat(self.store_path + '/meta_data.mat', {'meta_data': meta_dic})
        
    def init_format(self):
        """
        used only one time in env.py
        """
        for store_item in self.store_list:
            self.simulation_result_dic.update({store_item:[]})

    def read_init_location(self, entity_type = 'user', index = 0):
        if entity_type == 'user' or 'attacker' or 'RIS' or 'RIS_norm_vec' or 'UAV':
            return np.array([\
            pd.read_excel(self.init_data_file, sheet_name=entity_type)['x'][index],\
            pd.read_excel(self.init_data_file, sheet_name=entity_type)['y'][index],\
            pd.read_excel(self.init_data_file, sheet_name=entity_type)['z'][index]])
        else:
            return None
    
    def store_data(self, row_data, value_name):
        """
        docstring
        """
        self.simulation_result_dic[value_name].append(row_data)


================================================
FILE: ddpg.py
================================================
import os
import torch as T
#import torch.cuda as T
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
from data_manager import DataManager
class OUActionNoise(object):
    def __init__(self, mu, sigma=0.15, theta=.2, dt=1e-2, x0=None):
        self.theta = theta
        self.mu = mu
        self.sigma = sigma
        self.dt = dt
        self.x0 = x0
        self.reset()

    def __call__(self):
        x = self.x_prev + self.theta * (self.mu - self.x_prev) * self.dt + \
            self.sigma * np.sqrt(self.dt) * np.random.normal(size=self.mu.shape)
        self.x_prev = x
        return x

    def reset(self):
        self.x_prev = self.x0 if self.x0 is not None else np.zeros_like(self.mu)

    def __repr__(self):
        return 'OrnsteinUhlenbeckActionNoise(mu={}, sigma={})'.format(
                                                            self.mu, self.sigma)

class AWGNActionNoise(object):
    def __init__(self, mu = 0, sigma=1):
        self.mu = mu
        self.sigma = sigma

    def __call__(self):
        #self.mu = mu
        #self.sigma = sigma
        x = np.random.normal(size=self.mu.shape) * self.sigma
        return x

class ReplayBuffer(object):
    def __init__(self, max_size, input_shape, n_actions):
        self.mem_size = max_size
        self.mem_cntr = 0
        self.state_memory = np.zeros((self.mem_size, *input_shape))
        self.new_state_memory = np.zeros((self.mem_size, *input_shape))
        self.action_memory = np.zeros((self.mem_size, n_actions))
        self.reward_memory = np.zeros(self.mem_size)
        self.terminal_memory = np.zeros(self.mem_size, dtype=np.float32)

    def store_transition(self, state, action, reward, state_, done):
        index = self.mem_cntr % self.mem_size
        self.state_memory[index] = state
        self.new_state_memory[index] = state_
        self.action_memory[index] = action
        self.reward_memory[index] = reward
        self.terminal_memory[index] = 1 - done
        self.mem_cntr += 1

    def sample_buffer(self, batch_size):
        max_mem = min(self.mem_cntr, self.mem_size)

        batch = np.random.choice(max_mem, batch_size)

        states = self.state_memory[batch]
        actions = self.action_memory[batch]
        rewards = self.reward_memory[batch]
        states_ = self.new_state_memory[batch]
        terminal = self.terminal_memory[batch]

        return states, actions, rewards, states_, terminal

class CriticNetwork(nn.Module):
    def __init__(self, beta, input_dims, fc1_dims, fc2_dims, fc3_dims, fc4_dims, n_actions, name,
                 chkpt_dir='C:\\demo\\IRS_DDPG_minimal\\main_foder\\tmp\\ddpg', load_file = ''):
        super(CriticNetwork, self).__init__()
        self.input_dims = input_dims
        self.fc1_dims = fc1_dims
        self.fc2_dims = fc2_dims
        self.fc3_dims = fc3_dims
        self.fc4_dims = fc4_dims
        self.n_actions = n_actions
        self.checkpoint_file = os.path.join(chkpt_dir,name+'_ddpg')
        self.load_file = 'C:\\demo\\other_branch\\Learning-based_Secure_Transmission_for_RIS_Aided_mmWave-UAV_Communications_with_Imperfect_CSI\\data\\mannal_store\\models\\Critic_UAV_ddpg'
        self.fc1 = nn.Linear(*self.input_dims, self.fc1_dims)
        f1 = 1./np.sqrt(self.fc1.weight.data.size()[0])
        T.nn.init.uniform_(self.fc1.weight.data, -f1, f1)
        T.nn.init.uniform_(self.fc1.bias.data, -f1, f1)
        #self.fc1.weight.data.uniform_(-f1, f1)
        #self.fc1.bias.data.uniform_(-f1, f1)
        self.bn1 = nn.LayerNorm(self.fc1_dims)

        self.fc2 = nn.Linear(self.fc1_dims, self.fc2_dims)
        f2 = 1./np.sqrt(self.fc2.weight.data.size()[0])
        #f2 = 0.002
        T.nn.init.uniform_(self.fc2.weight.data, -f2, f2)
        T.nn.init.uniform_(self.fc2.bias.data, -f2, f2)
        #self.fc2.weight.data.uniform_(-f2, f2)
        #self.fc2.bias.data.uniform_(-f2, f2)
        self.bn2 = nn.LayerNorm(self.fc2_dims)

        self.fc3 = nn.Linear(self.fc2_dims, self.fc3_dims)
        f3 = 1./np.sqrt(self.fc3.weight.data.size()[0])
        T.nn.init.uniform_(self.fc3.weight.data, -f3, f3)
        T.nn.init.uniform_(self.fc3.bias.data, -f3, f3)
        #self.fc2.weight.data.uniform_(-f2, f2)
        #self.fc2.bias.data.uniform_(-f2, f2)
        self.bn3 = nn.LayerNorm(self.fc3_dims)

        self.fc4 = nn.Linear(self.fc3_dims, self.fc4_dims)
        f4 = 1./np.sqrt(self.fc4.weight.data.size()[0])
        T.nn.init.uniform_(self.fc4.weight.data, -f4, f4)
        T.nn.init.uniform_(self.fc4.bias.data, -f4, f4)
        #self.fc2.weight.data.uniform_(-f2, f2)
        #self.fc2.bias.data.uniform_(-f2, f2)
        self.bn4 = nn.LayerNorm(self.fc4_dims)

        self.action_value = nn.Linear(self.n_actions, self.fc4_dims)
        f5 = 0.003
        self.q = nn.Linear(self.fc4_dims, 1)
        T.nn.init.uniform_(self.q.weight.data, -f5, f5)
        T.nn.init.uniform_(self.q.bias.data, -f5, f5)
        #self.q.weight.data.uniform_(-f3, f3)
        #self.q.bias.data.uniform_(-f3, f3)

        self.optimizer = optim.Adam(self.parameters(), lr=beta)
#        if torch.cuda.available():
#            import torch.cuda as T
#        else:
#            import torch as T
        self.device = T.device('cuda:0' if T.cuda.is_available() else 'cpu')

        self.to(self.device)

    def forward(self, state, action):
        state_value = self.fc1(state)
        state_value = self.bn1(state_value)
        state_value = F.relu(state_value)
        state_value = self.fc2(state_value)
        state_value = self.bn2(state_value)
        state_value = F.relu(state_value)
        state_value = self.fc3(state_value)
        state_value = self.bn3(state_value)
        state_value = F.relu(state_value)
        state_value = self.fc4(state_value)
        state_value = self.bn4(state_value)

        action_value = F.relu(self.action_value(action))
        state_action_value = F.relu(T.add(state_value, action_value))
        state_action_value = self.q(state_action_value)

        return state_action_value

    def save_checkpoint(self):
        print('... saving checkpoint ...')
        T.save(self.state_dict(), self.checkpoint_file)

    def load_checkpoint(self,load_file = ''):
        print('... loading checkpoint ...')
        if T.cuda.is_available():
            self.load_state_dict(T.load(load_file))
        else:
            self.load_state_dict(T.load(load_file, map_location=T.device('cpu')))

class ActorNetwork(nn.Module):
    def __init__(self, alpha, input_dims, fc1_dims, fc2_dims, fc3_dims, fc4_dims, n_actions, name,
                 chkpt_dir='C:\\demo\\IRS_DDPG_minimal\\main_foder\\tmp\\ddpg', load_file = ''):
        super(ActorNetwork, self).__init__()
        self.input_dims = input_dims
        self.fc1_dims = fc1_dims
        self.fc2_dims = fc2_dims
        self.fc3_dims = fc3_dims
        self.fc4_dims = fc4_dims        
        self.n_actions = n_actions
        self.checkpoint_file = os.path.join(chkpt_dir,name+'_ddpg')
        self.load_file = 'C:\\demo\\other_branch\\Learning-based_Secure_Transmission_for_RIS_Aided_mmWave-UAV_Communications_with_Imperfect_CSI\\data\\mannal_store\\models\\Actor_UAV_ddpg'
        self.fc1 = nn.Linear(*self.input_dims, self.fc1_dims)
        f1 = 1./np.sqrt(self.fc1.weight.data.size()[0])
#        T.nn.init.uniform_(self.fc1.weight.data, -f1, f1)
#        T.nn.init.uniform_(self.fc1.bias.data, -f1, f1)
        self.fc1.weight.data.uniform_(-f1, f1)
        self.fc1.bias.data.uniform_(-f1, f1)
        self.bn1 = nn.LayerNorm(self.fc1_dims)

        self.fc2 = nn.Linear(self.fc1_dims, self.fc2_dims)
        #f2 = 0.002
        f2 = 1./np.sqrt(self.fc2.weight.data.size()[0])
#        T.nn.init.uniform_(self.fc2.weight.data, -f2, f2)
#        T.nn.init.uniform_(self.fc2.bias.data, -f2, f2)
        self.fc2.weight.data.uniform_(-f2, f2)
        self.fc2.bias.data.uniform_(-f2, f2)
        self.bn2 = nn.LayerNorm(self.fc2_dims)

        self.fc3 = nn.Linear(self.fc2_dims, self.fc3_dims)
        f3 = 1./np.sqrt(self.fc3.weight.data.size()[0])
#        T.nn.init.uniform_(self.fc3.weight.data, -f3, f3)
#        T.nn.init.uniform_(self.fc3.bias.data, -f3, f3)
        self.fc2.weight.data.uniform_(-f2, f2)
        self.fc2.bias.data.uniform_(-f2, f2)
        self.bn3 = nn.LayerNorm(self.fc3_dims)

        self.fc4 = nn.Linear(self.fc3_dims, self.fc4_dims)
        f4 = 1./np.sqrt(self.fc4.weight.data.size()[0])
#        T.nn.init.uniform_(self.fc4.weight.data, -f4, f4)
#        T.nn.init.uniform_(self.fc4.bias.data, -f4, f4)
        self.fc2.weight.data.uniform_(-f2, f2)
        self.fc2.bias.data.uniform_(-f2, f2)
        self.bn4 = nn.LayerNorm(self.fc4_dims)

        #f3 = 0.004
        f5 = 0.003
        self.mu = nn.Linear(self.fc4_dims, self.n_actions)
#        T.nn.init.uniform_(self.mu.weight.data, -f5, f5)
#        T.nn.init.uniform_(self.mu.bias.data, -f5, f5)
        self.mu.weight.data.uniform_(-f3, f3)
        self.mu.bias.data.uniform_(-f3, f3)

        self.optimizer = optim.Adam(self.parameters(), lr=alpha)
        self.device = T.device('cuda:0' if T.cuda.is_available() else 'cpu')

        self.to(self.device)

    def forward(self, state):
        x = self.fc1(state)
        x = self.bn1(x)
        x = F.relu(x)
        x = self.fc2(x)
        x = self.bn2(x)
        x = F.relu(x)
        x = self.fc3(x)
        x = self.bn3(x)
        x = F.relu(x)
        x = self.fc4(x)
        x = self.bn4(x)
        x = F.relu(x)
        x = T.tanh(self.mu(x))

        return x

    def save_checkpoint(self):
        print('... saving checkpoint ...')
        T.save(self.state_dict(), self.checkpoint_file)

    def load_checkpoint(self, load_file=''):
        print('... loading checkpoint ...')
        if T.cuda.is_available():
            self.load_state_dict(T.load(load_file))
        else:
            self.load_state_dict(T.load(load_file, map_location=T.device('cpu')))

class Agent(object):
    def __init__(self, alpha, beta, input_dims, tau, env, gamma=0.99,
                 n_actions=2, max_size=1000000, layer1_size=400,
                 layer2_size=300, layer3_size=256, layer4_size=128, batch_size=64, noise = 'AWGN', agent_name = 'default', load_file = ''):
        self.load_file = load_file
        self.layer1_size = layer1_size
        self.layer2_size = layer2_size
        self.layer3_size = layer3_size
        self.layer4_size = layer4_size
        self.gamma = gamma
        self.tau = tau
        self.memory = ReplayBuffer(max_size, input_dims, n_actions)
        self.batch_size = batch_size

        self.actor = ActorNetwork(alpha, input_dims, layer1_size,
                                  layer2_size, layer3_size, layer4_size, n_actions=n_actions,
                                  name='Actor_' + agent_name,chkpt_dir=env.data_manager.store_path )
        self.critic = CriticNetwork(beta, input_dims, layer1_size,
                                    layer2_size, layer3_size, layer4_size, n_actions=n_actions,
                                    name='Critic_' + agent_name,chkpt_dir=env.data_manager.store_path)

        self.target_actor = ActorNetwork(alpha, input_dims, layer1_size,
                                         layer2_size, layer3_size, layer4_size, n_actions=n_actions,
                                         name='TargetActor_' + agent_name,chkpt_dir=env.data_manager.store_path)
        self.target_critic = CriticNetwork(beta, input_dims, layer1_size,
                                           layer2_size, layer3_size, layer4_size, n_actions=n_actions,
                                           name='TargetCritic_' + agent_name,chkpt_dir=env.data_manager.store_path)
        if noise == 'OU':
            self.noise = OUActionNoise(mu=np.zeros(n_actions))
        elif noise == 'AWGN':
            self.noise = AWGNActionNoise(mu = np.zeros(n_actions))
        # tau = 1 means copy parameters to target
        self.update_network_parameters(tau=1)

    def choose_action(self, observation, greedy=0.5, epsilon = 1):
        self.actor.eval()
        observation = T.tensor(observation, dtype=T.float).to(self.actor.device)
        mu = self.actor.forward(observation).to(self.actor.device)
        mu_prime = mu + T.tensor(greedy * self.noise(),
                                 dtype=T.float).to(self.actor.device)
        self.actor.train()
        return mu_prime.cpu().detach().numpy()


    def remember(self, state, action, reward, new_state, done):
        self.memory.store_transition(state, action, reward, new_state, done)

    def learn(self):
        if self.memory.mem_cntr < self.batch_size:
            return
        # the done here is opposite of the done in the env
        state, action, reward, new_state, done = \
                                      self.memory.sample_buffer(self.batch_size)

        # trun s, a, r, new_s into tensor
        reward = T.tensor(reward, dtype=T.float).to(self.critic.device)
        done = T.tensor(done).to(self.critic.device)
        new_state = T.tensor(new_state, dtype=T.float).to(self.critic.device)
        action = T.tensor(action, dtype=T.float).to(self.critic.device)
        state = T.tensor(state, dtype=T.float).to(self.critic.device)

        # trun on evaliation mode of target actor, target critic, critic net
        # fix these three nets
        self.target_actor.eval()
        self.target_critic.eval()
        self.critic.eval()

        # argmax_action*(Q(new_state,action*)),i.e. pi(s|pi),i.e. stratage choose the action* maxmium the future Q 
        target_actions = self.target_actor.forward(new_state)
        # caculate the Q(new_state, new_action)
        critic_value_ = self.target_critic.forward(new_state, target_actions)
        # caculate the Q(state, action)
        critic_value = self.critic.forward(state, action)

        target = []
        for j in range(self.batch_size):
            target.append(reward[j] + self.gamma*critic_value_[j]*done[j])
        target = T.tensor(target).to(self.critic.device)
        target = target.view(self.batch_size, 1)

        # here update the critic net using mse of (r + gamma * Q_argmax_a*(newstate, a*)) - Q(state, action)
        self.critic.train()
        self.critic.optimizer.zero_grad()
        critic_loss = F.mse_loss(target, critic_value)
        critic_loss.backward()
        self.critic.optimizer.step()

        # here update the actor net by policy gradient
        # first fix the critic net
        self.critic.eval()
        self.actor.optimizer.zero_grad()
        mu = self.actor.forward(state)
        self.actor.train()
        actor_loss = -self.critic.forward(state, mu)
        actor_loss = T.mean(actor_loss)
        actor_loss.backward()
        self.actor.optimizer.step()

        self.update_network_parameters()

    def update_network_parameters(self, tau=None):
        if tau is None:
            tau = self.tau

        actor_params = self.actor.named_parameters()
        critic_params = self.critic.named_parameters()
        target_actor_params = self.target_actor.named_parameters()
        target_critic_params = self.target_critic.named_parameters()

        critic_state_dict = dict(critic_params)
        actor_state_dict = dict(actor_params)
        target_critic_dict = dict(target_critic_params)
        target_actor_dict = dict(target_actor_params)

        for name in critic_state_dict:
            critic_state_dict[name] = tau*critic_state_dict[name].clone() + \
                                      (1-tau)*target_critic_dict[name].clone()

        self.target_critic.load_state_dict(critic_state_dict)

        for name in actor_state_dict:
            actor_state_dict[name] = tau*actor_state_dict[name].clone() + \
                                      (1-tau)*target_actor_dict[name].clone()
        self.target_actor.load_state_dict(actor_state_dict)

        """
        #Verify that the copy assignment worked correctly
        target_actor_params = self.target_actor.named_parameters()
        target_critic_params = self.target_critic.named_parameters()
        critic_state_dict = dict(target_critic_params)
        actor_state_dict = dict(target_actor_params)
        print('\nActor Networks', tau)
        for name, param in self.actor.named_parameters():
            print(name, T.equal(param, actor_state_dict[name]))
        print('\nCritic Networks', tau)
        for name, param in self.critic.named_parameters():
            print(name, T.equal(param, critic_state_dict[name]))
        input()
        """
    def save_models(self):
        self.actor.save_checkpoint()
        self.target_actor.save_checkpoint()
        self.critic.save_checkpoint()
        self.target_critic.save_checkpoint()

    def load_models(self, load_file_actor = '',load_file_critic =''):
        self.actor.load_checkpoint(load_file = load_file_actor)
        self.target_actor.load_checkpoint(load_file = load_file_actor)
        self.critic.load_checkpoint(load_file = load_file_critic)
        self.target_critic.load_checkpoint(load_file = load_file_critic)

    def check_actor_params(self):
        current_actor_params = self.actor.named_parameters()
        current_actor_dict = dict(current_actor_params)
        original_actor_dict = dict(self.original_actor.named_parameters())
        original_critic_dict = dict(self.original_critic.named_parameters())
        current_critic_params = self.critic.named_parameters()
        current_critic_dict = dict(current_critic_params)
        print('Checking Actor parameters')

        for param in current_actor_dict:
            print(param, T.equal(original_actor_dict[param], current_actor_dict[param]))
        print('Checking critic parameters')
        for param in current_critic_dict:
            print(param, T.equal(original_critic_dict[param], current_critic_dict[param]))
        input()


================================================
FILE: entity.py
================================================
import numpy as np
import math
#from math_tool import *

class UAV(object):
    """
    UAV object with coordinate 
    And with ULA antenas, default 8 
    And limited power
    And with fixed rotation angle
    """
    def __init__(self, coordinate, index = 0, rotation = 0, ant_num=16, ant_type = 'ULA', max_movement_per_time_slot = 0.5):
        """
        coordinate is the init coordinate of UAV, meters, np.array
        """
        self.max_movement_per_time_slot = max_movement_per_time_slot
        self.type = 'UAV'
        self.coordinate = coordinate
        self.rotation = rotation
        self.ant_num = ant_num
        self.ant_type = ant_type
        self.coor_sys = [np.array([1, 0, 0]), np.array([0, -1, 0]), np.array([0, 0, -1])]
        self.index = index

        # init beamforming matrix in UAV (must be inited in env.py)
        self.G = np.mat(np.zeros((ant_num, 1)))
        self.G_Pmax = 0

    def reset(self, coordinate):
        """
        reset UAV coordinate
        """
        self.coordinate = coordinate
        
    def update_coor_sys(self, delta_angle):
        """
        used in function move to update the relevant coordinate system 
        """
        self.rotation = self.rotation + delta_angle
        coor_sys_x = np.array([\
        math.cos(self.rotation),\
        math.sin(self.rotation),\
        0])
        coor_sys_z = np.array([\
        0,\
        0,\
        -1])
        coor_sys_y = np.cross(coor_sys_z, coor_sys_x)
        self.coor_sys = np.array([coor_sys_x,coor_sys_y,coor_sys_z])
        
    def update_coordinate(self, distance_delta_d, direction_fai):
        """
        used in function move to update UAV cordinate
        """
        delta_x = distance_delta_d * math.cos(direction_fai)
        delta_y = distance_delta_d * math.sin(direction_fai)
        self.coordinate[0] += delta_x
        self.coordinate[1] += delta_y

    def move(self, distance_delta_d, direction_fai, delta_angle = 0):
        """
        preform the 2D movement every step
        """
        self.update_coordinate(distance_delta_d, direction_fai)
        self.update_coor_sys(delta_angle)

class RIS(object):
    """
    reconfigrable intelligent surface
    with N reflecting elements, UPA, default 4 X 4 = 16
    continues phase shift
    """
    def __init__(self, coordinate, coor_sys_z, index = 0, ant_num=36, ant_type = 'UPA'):
        """
        coordinate is the init coordinate of with N reflecting elements, meters, np.array
        norm_vec is the normal vector of the reflecting direction
        !!! ant_num Must be the square of a certain int number
        """
        self.type = 'RIS'
        self.coordinate = coordinate
        self.ant_num = ant_num
        self.ant_type = ant_type
        coor_sys_z = coor_sys_z / np.linalg.norm(coor_sys_z)
        coor_sys_x = np.cross(coor_sys_z, np.array([0,0,1]))
        coor_sys_x = coor_sys_x / np.linalg.norm(coor_sys_x)
        coor_sys_y = np.cross(coor_sys_z, coor_sys_x)
        self.coor_sys = [coor_sys_x,coor_sys_y,coor_sys_z]
        self.index = index

        # init reflecting phase shift
        self.Phi = np.mat(np.diag(np.ones(self.ant_num, dtype=complex)), dtype = complex)

class User(object):
    """
    user with single antenas
    """
    def __init__(self, coordinate, index, ant_num = 1, ant_type = 'single'):
        """
        coordinate is the init coordinate of user, meters, np.array
        ant_num is the antenas number of user
        """
        self.type = 'user'
        self.coordinate = coordinate
        self.ant_num = ant_num
        self.ant_type = ant_type
        self.index = index
        self.coor_sys = [np.array([1, 0, 0]), np.array([0, 1, 0]), np.array([0, 0, 1])]

        # init the capacity
        self.capacity = 0
        self.secure_capacity = 0
        self.QoS_constrain = 0
        # init the comprehensive_channel, (must used in env.py to init)
        self.comprehensive_channel = 0
        # init receive noise sigma in dB
        self.noise_power = -114

    def reset(self, coordinate):
        """
        reset user coordinate
        """
        self.coordinate = coordinate
        
    def update_coordinate(self, distance_delta_d, direction_fai):
        """
        used in function move to update UAV cordinate
        """
        delta_x = distance_delta_d * math.cos(direction_fai)
        delta_y = distance_delta_d * math.sin(direction_fai)
        self.coordinate[0] += delta_x
        self.coordinate[1] += delta_y

    def move(self, distance_delta_d, direction_fai):
        """
        preform the 2D movement every step
        """
        self.update_coordinate(distance_delta_d, direction_fai)
        
class Attacker(object):
    """
    Attacker with single antenas
    """
    def __init__(self, coordinate, index, ant_num = 1, ant_type= 'single'):
        """
        coordinate is the init coordinate of Attacker, meters, np.array
        ant_num is the antenas number of Attacker
        """
        self.type = 'attacker'
        self.coordinate = coordinate
        self.ant_num = ant_num
        self.ant_type = ant_type
        self.index = index
        self.coor_sys = [np.array([1, 0, 0]), np.array([0, 1, 0]), np.array([0, 0, 1])]

        # init the capacity, this is a K length np.array ,shape: (K,)
        # represent the attack rate for kth user, (must init in env.py)
        self.capacity = 0
        self.comprehensive_channel = 0
        # init receive noise sigma in dBmW
        self.noise_power = -114

    def reset(self, coordinate):
        """
        reset attacker coordinate
        """
        self.coordinate = coordinate

    def update_coordinate(self, distance_delta_d, direction_fai):
        """
        used in function move to update UAV cordinate
        """
        delta_x = distance_delta_d * math.cos(direction_fai)
        delta_y = distance_delta_d * math.sin(direction_fai)
        self.coordinate[0] += delta_x
        self.coordinate[1] += delta_y

    def move(self, distance_delta_d, direction_fai):
        """
        preform the 2D movement every step
        """
        self.update_coordinate(distance_delta_d, direction_fai)

================================================
FILE: env.py
================================================
#%matplotlib inline
import numpy as np
from entity import *
from channel import *
from math_tool import *
from datetime import datetime
from mpl_toolkits import mplot3d
import matplotlib.pyplot as plt
from render import Render
from data_manager import DataManager
# s.t every simulition is the same model
np.random.seed(2)

######################################################
# new for energy 
# energy related parameters of rotary-wing UAV
# based on Energy Minimization in Internet-of-Things System Based on Rotary-Wing UAV
P_i = 790.6715
P_0 = 580.65
U2_tip = (200) ** 2
s = 0.05
d_0 = 0.3
p = 1.225
A = 0.79
delta_time = 0.1/1000 #0.1ms

# add ons hover veloctiy
# based on https://www.intechopen.com/chapters/57483
m = 1.3 # mass: assume 1.3kg https://www.droneblog.com/average-weights-of-common-types-of-drones/#:~:text=In%20most%20cases%2C%20toy%20drones,What%20is%20this%3F
g = 9.81 # gravity
T = m * g # thrust
v_0 = (T / (A * 2 * p)) ** 0.5

def get_energy_consumption(v_t):
    '''
    arg
    1) v_t = displacement per time slot
    '''
    energy_1 = P_0 \
                + 3 * P_0 * (abs(v_t)) ** 2 / U2_tip \
                + 0.5 * d_0 * p * s * A * (abs(v_t))**3
    
    energy_2 = P_i * ((
                    (1 + (abs(v_t) ** 4) / (4 * (v_0 ** 4))) ** 0.5 \
                    - (abs(v_t) ** 2) / (2 * (v_0 **2)) \
                ) ** 0.5)
    
    energy = delta_time * (energy_1 + energy_2)
    return energy 

ENERGY_MIN = get_energy_consumption(0.25)
ENERGY_MAX = get_energy_consumption(0)

######################################################


class MiniSystem(object):
#class MiniSystem(K=1):
    """
    define mini RIS communication system with one UAV
        and one RIS and one user, one attacker
    """
    def __init__(self, UAV_num = 1, RIS_num = 1, user_num = 1, attacker_num = 1, fre = 28e9, \
                 RIS_ant_num = 16, UAV_ant_num=8, if_dir_link = 1, if_with_RIS = True, \
                 if_move_users = True, if_movements = True, reverse_x_y = (True, True), \
                 if_UAV_pos_state = True, reward_design = 'ssr', project_name = None, step_num=100):
        self.if_dir_link = if_dir_link
        self.if_with_RIS = if_with_RIS
        self.if_move_users = if_move_users
        self.if_movements = if_movements
        self.if_UAV_pos_state = if_UAV_pos_state
        self.reverse_x_y = reverse_x_y
        self.user_num = user_num
        self.attacker_num = attacker_num
        self.border = [(-25,25), (0, 50)]
        # 1.init entities: 1 UAV, 1 RIS, many users and attackers
        self.data_manager = DataManager(file_path='./data', project_name = project_name, \
        store_list = ['beamforming_matrix', 'reflecting_coefficient', 'UAV_state', 'user_capacity', 'secure_capacity', 'attaker_capacity','G_power', 'reward','UAV_movement'])
        # 1.1 init UAV position and beamforming matrix
        self.UAV = UAV(
            coordinate=self.data_manager.read_init_location('UAV', 0), 
            ant_num= UAV_ant_num, 
            max_movement_per_time_slot=0.25)
        self.UAV.G = np.mat(np.ones((self.UAV.ant_num, user_num), dtype=complex), dtype=complex)
        self.power_factor = 100
        self.UAV.G_Pmax =  np.trace(self.UAV.G * self.UAV.G.H) * self.power_factor
        # 1.2 init RIS
        self.RIS = RIS(\
        coordinate=self.data_manager.read_init_location('RIS', 0), \
        coor_sys_z=self.data_manager.read_init_location('RIS_norm_vec', 0), \
        ant_num=RIS_ant_num)
        # 1.3 init users
        self.user_list = []
        
        for i in range(user_num):
            user_coordinate = self.data_manager.read_init_location('user', i)
            user = User(coordinate=user_coordinate, index=i)
            user.noise_power = -114
            self.user_list.append(user)

        # 1.4 init attackers
        self.attacker_list = []
        
        for i in range(attacker_num):
            attacker_coordinate = self.data_manager.read_init_location('attacker', i)
            attacker = Attacker(coordinate=attacker_coordinate, index=i)
            attacker.capacity = np.zeros((user_num))
            attacker.noise_power = -114
            self.attacker_list.append(attacker)
        # 1.5 generate the eavesdrop capacity array , shape: P X K
        self.eavesdrop_capacity_array= np.zeros((attacker_num, user_num))
        
        # 1.6 reward design
        self.reward_design = reward_design # reward_design is ['ssr' or 'see']

        # 1.7 step_num
        self.step_num = step_num
        
        # 2.init channel
        self.H_UR = mmWave_channel(self.UAV, self.RIS, fre)
        self.h_U_k = []
        self.h_R_k = []
        self.h_U_p = []
        self.h_R_p = []
        for user_k in self.user_list:
            self.h_U_k.append(mmWave_channel(user_k, self.UAV, fre))
            self.h_R_k.append(mmWave_channel(user_k, self.RIS, fre))
        for attacker_p in self.attacker_list:
            self.h_U_p.append(mmWave_channel(attacker_p, self.UAV, fre))
            self.h_R_p.append(mmWave_channel(attacker_p, self.RIS, fre))

        # 3 update user and attaker channel capacity
        self.update_channel_capacity()

        # 4 draw system
        self.render_obj = Render(self)      
        
    def reset(self):
        """
        reset UAV, users, attackers, beamforming matrix, reflecting coefficient
        """
        # 1 reset UAV
        self.UAV.reset(coordinate=self.data_manager.read_init_location('UAV', 0))
        # 2 reset users
        for i in range(self.user_num):
            user_coordinate = self.data_manager.read_init_location('user', i)
            self.user_list[i].reset(coordinate=user_coordinate)
        # 3 reset attackers
        for i in range(self.attacker_num):
            attacker_coordinate = self.data_manager.read_init_location('attacker', i)
            self.attacker_list[i].reset(coordinate=attacker_coordinate)
        # 4 reset beamforming matrix
        self.UAV.G = np.mat(np.ones((self.UAV.ant_num, self.user_num), dtype=complex), dtype=complex)
        self.UAV.G_Pmax = np.trace(self.UAV.G * self.UAV.G.H) * self.power_factor
        # 5 reset reflecting coefficient
        """self.RIS = RIS(\
        coordinate=self.data_manager.read_init_location('RIS', 0), \
        coor_sys_z=self.data_manager.read_init_location('RIS_norm_vec', 0), \
        ant_num=16)"""
        self.RIS.Phi = np.mat(np.diag(np.ones(self.RIS.ant_num, dtype=complex)), dtype = complex)
        # 6 reset time
        self.render_obj.t_index = 0
        # 7 reset CSI
        self.H_UR.update_CSI()
        for h in self.h_U_k + self.h_U_p + self.h_R_k + self.h_R_p:
            h.update_CSI()
        # 8 reset capcaity
        self.update_channel_capacity()

    def step(self, action_0 = 0, action_1 = 0, G = 0, Phi = 0, set_pos_x = 0, set_pos_y = 0):
        """
        test step only move UAV and update channel
        """
        # 0 update render
        
        self.render_obj.t_index += 1
        # 1 update entities
        
        if self.if_move_users:
            self.user_list[0].update_coordinate(0.2, -1/2 * math.pi)
            self.user_list[1].update_coordinate(0.2, -1/2 * math.pi)

        if self.if_movements:
            move_x = action_0 * self.UAV.max_movement_per_time_slot
            move_y = action_1 * self.UAV.max_movement_per_time_slot
            
            ######################################################
            # new for energy 
            v_t = (move_x ** 2 + move_y ** 2) ** 0.5
            #self.data_manager.store_data([v_t],'velocity')
            ######################################################

            if self.reverse_x_y[0]:
                move_x = -move_x
            
            if self.reverse_x_y[1]:
                move_y = -move_y
                
            self.UAV.coordinate[0] +=move_x
            self.UAV.coordinate[1] +=move_y
            self.data_manager.store_data([move_x, move_y], 'UAV_movement')
        else:
            set_pos_x = map_to(set_pos_x, (-1, 1), self.border[0])
            set_pos_y = map_to(set_pos_y, (-1, 1), self.border[1])
            self.UAV.coordinate[0] = set_pos_x
            self.UAV.coordinate[1] = set_pos_y

        # 2 update channel CSI
        
        for h in self.h_U_k + self.h_U_p + self.h_R_k + self.h_R_p:
            h.update_CSI()
        # !!! test to make direct link zero
        if self.if_dir_link == 0:
            for h in self.h_U_k + self.h_U_p:
                h.channel_matrix = np.mat(np.zeros(shape = np.shape(h.channel_matrix)), dtype=complex)
        if self.if_with_RIS == False:
            self.H_UR.channel_matrix = np.mat(np.zeros((self.RIS.ant_num, self.UAV.ant_num)), dtype=complex)
        else:
            self.H_UR.update_CSI()
        # 3 update beamforming matrix & reflecting phase shift
        """
        self.UAV.G = G
        self.RIS.Phi = Phi
        """
        self.UAV.G = convert_list_to_complex_matrix(G, (self.UAV.ant_num, self.user_num)) * math.pow(self.power_factor, 0.5)
        
        # fix beamforming matrix
        #self.UAV.G = np.mat(np.ones((self.UAV.ant_num, self.user_num), dtype=complex), dtype=complex) * math.pow(self.power_factor, 0.5)
        if self.if_with_RIS:
            self.RIS.Phi = convert_list_to_complex_diag(Phi, self.RIS.ant_num)
        # 4 update channel capacity in every user and attacker
        self.update_channel_capacity()
        # 5 store current system state to .mat
        self.store_current_system_sate()
        # 6 get new state
        new_state = self.observe()
        # 7 get reward
        reward = self.reward()
        
        # 7.1 reward with energy efficiency
        ######################################################
        if self.reward_design == 'see':
            # new for see
            energy = energy_raw = get_energy_consumption(v_t)
            energy -= ENERGY_MIN
            energy /= (ENERGY_MAX - ENERGY_MIN)
            energy_penalty = -1 * 0.1 * abs(reward) * energy # -1 * 0.1 * reward * energy
            if reward > 0:
                reward += energy_penalty
        ######################################################
        
        # 8 calculate if UAV is cross the bourder
        reward = math.tanh(reward) # new for energy (ori not commented)
        done = False
        x, y = self.UAV.coordinate[0:2]
        if x < self.border[0][0] or x > self.border[0][1]:
            done = True
            reward = -10
        if y < self.border[1][0] or y > self.border[1][1]:
            done = True
            reward = -10
        self.data_manager.store_data([reward],'reward')
        return new_state, reward, done, []

    def reward(self):
        """
        used in function step to get the reward of current step
        """
        reward = 0
        reward_ = 0
        P = np.trace(self.UAV.G * self.UAV.G.H)
        if abs(P) > abs(self.UAV.G_Pmax) :
            reward = abs(self.UAV.G_Pmax) - abs(P)
            reward /= self.power_factor 
        else:
            for user in self.user_list:
                r = user.capacity - max(self.eavesdrop_capacity_array[:, user.index])
                if r < user.QoS_constrain:
                    reward_ += r - user.QoS_constrain
                else:
                    reward += r/(self.user_num*2)
            if reward_ < 0:
                reward = reward_ * self.user_num * 10
     
        return reward
    
    def observe(self):
        """
        used in function main to get current state
        the state is a list with 
        """
        # users' and attackers' comprehensive channel
        comprehensive_channel_elements_list = []
        for entity in self.user_list + self.attacker_list:
            tmp_list = list(np.array(np.reshape(entity.comprehensive_channel, (1,-1)))[0])
            comprehensive_channel_elements_list += list(np.real(tmp_list)) + list(np.imag(tmp_list)) 
        UAV_position_list = []
        if self.if_UAV_pos_state:
            UAV_position_list = list(self.UAV.coordinate)

        return comprehensive_channel_elements_list + UAV_position_list

    def store_current_system_sate(self):
        """
        function used in step() to store system state
        """
        # 1 store beamforming matrix
        row_data = list(np.array(np.reshape(self.UAV.G, (1, -1)))[0,:])
        self.data_manager.store_data(row_data, 'beamforming_matrix')
        # 2 store reflecting coefficient matrix
        row_data = list(np.array(np.reshape(diag(self.RIS.Phi), (1,-1)))[0,:])      
        self.data_manager.store_data(row_data, 'reflecting_coefficient')
        # 3 store UAV state
        row_data = list(self.UAV.coordinate)
        self.data_manager.store_data(row_data, 'UAV_state')
        # 4 store user_capicity
        row_data = [user.secure_capacity for user in self.user_list] \
        + [user.capacity for user in self.user_list]
        # 5 store G_power
        row_data = [np.trace(self.UAV.G*self.UAV.G.H), self.UAV.G_Pmax]
        self.data_manager.store_data(row_data, 'G_power')
        row_data = []
        for user in self.user_list:
            row_data.append(user.capacity)
        self.data_manager.store_data(row_data, 'user_capacity')

        row_data = []
        for attacker in self.attacker_list:
            row_data.append(attacker.capacity)
        self.data_manager.store_data(row_data, 'attaker_capacity')

        row_data = []
        for user in self.user_list:
            row_data.append(user.secure_capacity)
        self.data_manager.store_data(row_data, 'secure_capacity')


    def update_channel_capacity(self):
        """
        function used in step to calculate user and attackers' capacity 
        """
        # 1 calculate eavesdrop rate
        for attacker in self.attacker_list:
            attacker.capacity = self.calculate_capacity_array_of_attacker_p(attacker.index)
            self.eavesdrop_capacity_array[attacker.index, :] = attacker.capacity
            # remmeber to update comprehensive_channel
            attacker.comprehensive_channel = self.calculate_comprehensive_channel_of_attacker_p(attacker.index)
        # 2 calculate unsecure rate
        for user in self.user_list:
            user.capacity = self.calculate_capacity_of_user_k(user.index)
            # 3 calculate secure rate
            user.secure_capacity = self.calculate_secure_capacity_of_user_k(user.index)
            # remmeber to update comprehensive_channel
            user.comprehensive_channel = self.calculate_comprehensive_channel_of_user_k(user.index)

    def calculate_comprehensive_channel_of_attacker_p(self, p):
        """
        used in update_channel_capacity to calculate the comprehensive_channel of attacker p
        """
        h_U_p = self.h_U_p[p].channel_matrix
        h_R_p = self.h_R_p[p].channel_matrix
        Psi = diag_to_vector(self.RIS.Phi)
        H_c = vector_to_diag(h_R_p).H * self.H_UR.channel_matrix
        return h_U_p.H + Psi.H * H_c

    def calculate_comprehensive_channel_of_user_k(self, k):
        """
        used in update_channel_capacity to calculate the comprehensive_channel of user k
        """
        h_U_k = self.h_U_k[k].channel_matrix
        h_R_k = self.h_R_k[k].channel_matrix
        Psi = diag_to_vector(self.RIS.Phi)
        H_c = vector_to_diag(h_R_k).H * self.H_UR.channel_matrix
        return h_U_k.H + Psi.H * H_c

    def calculate_capacity_of_user_k(self, k):
        """
        function used in update_channel_capacity to calculate one user
        """     
        noise_power = self.user_list[k].noise_power
        h_U_k = self.h_U_k[k].channel_matrix
        h_R_k = self.h_R_k[k].channel_matrix
        Psi = diag_to_vector(self.RIS.Phi)
        H_c = vector_to_diag(h_R_k).H * self.H_UR.channel_matrix
        G_k = self.UAV.G[:, k]
        G_k_ = 0
        if len(self.user_list) == 1:
            G_k_ = np.mat(np.zeros((self.UAV.ant_num, 1), dtype=complex), dtype=complex)
        else:
            G_k_1 = self.UAV.G[:, 0:k]
            G_k_2 = self.UAV.G[:, k+1:]
            G_k_ = np.hstack((G_k_1, G_k_2))
        alpha_k = math.pow(abs((h_U_k.H + Psi.H * H_c) * G_k), 2)
        beta_k = math.pow(np.linalg.norm((h_U_k.H + Psi.H * H_c)*G_k_), 2) + dB_to_normal(noise_power) * 1e-3
        return math.log10(1 + abs(alpha_k / beta_k))

    def calculate_capacity_array_of_attacker_p(self, p):
        """
        function used in update_channel_capacity to calculate one attacker capacities to K users
        output is a K length np.array ,shape: (K,)
        """
        K = len(self.user_list)
        noise_power = self.attacker_list[p].noise_power
        h_U_p = self.h_U_p[p].channel_matrix
        h_R_p = self.h_R_p[p].channel_matrix
        Psi = diag_to_vector(self.RIS.Phi)
        H_c = vector_to_diag(h_R_p).H * self.H_UR.channel_matrix
        if K == 1:
            G_k = self.UAV.G
            G_k_ = np.mat(np.zeros((self.UAV.ant_num, 1), dtype=complex), dtype=complex)
            alpha_p = math.pow(abs((h_U_p.H + Psi.H * H_c) * G_k), 2)
            beta_p = math.pow(np.linalg.norm((h_U_p.H + Psi.H * H_c)*G_k_), 2) + dB_to_normal(noise_power) * 1e-3
            return np.array([math.log10(1 + abs(alpha_p / beta_p))])
        else:
            result = np.zeros(K)
            for k in range(K):
                G_k = G_k = self.UAV.G[:, k]
                G_k_1 = self.UAV.G[:, 0:k]
                G_k_2 = self.UAV.G[:, k+1:]
                G_k_ = np.hstack((G_k_1, G_k_2))
                alpha_p = math.pow(abs((h_U_p.H + Psi.H * H_c) * G_k), 2)
                beta_p = math.pow(np.linalg.norm((h_U_p.H + Psi.H * H_c)*G_k_), 2) + dB_to_normal(noise_power) * 1e-3
                result[k] = math.log10(1 + abs(alpha_p / beta_p))
            return result

    def calculate_secure_capacity_of_user_k(self, k=2):
        """
        function used in update_channel_capacity to calculate the secure rate of user k
        """
        user = self.user_list[k]
        R_k_unsecure = user.capacity
        R_k_maxeavesdrop = max(self.eavesdrop_capacity_array[:, k])
        secrecy_rate= max(0, R_k_unsecure - R_k_maxeavesdrop)
        return secrecy_rate

    def get_system_action_dim(self):
        """
        function used in main function to get the dimention of actions
        """
        result = 0
        # 0 UAV movement
        result += 2
        # 1 RIS reflecting elements
        if self.if_with_RIS:
            result += self.RIS.ant_num   
        else:
            result += 0
        # 2 beamforming matrix dimention
        result += 2 * self.UAV.ant_num * self.user_num 
        return result

    def get_system_state_dim(self):
        """
        function used in main function to get the dimention of states
        """
        result = 0
        # users' and attackers' comprehensive channel
        result += 2 * (self.user_num + self.attacker_num) * self.UAV.ant_num
        # UAV position
        if self.if_UAV_pos_state:
            result += 3
        return result


================================================
FILE: env1.py
================================================
import numpy as np
import math
import cmath
np.random.seed(42)

class minimal_IRS_system():
    def __init__(self, BS_M = 8, IRS_N_x = 4, IRS_N_y =2, K = 8, statistic = False):
        self.M = BS_M  # number of BS antennas
        self.N = IRS_N_x * IRS_N_y  # number of IRS elements
        self.K = K  # number of users
        self.z_BS = 0 # hight of BS antenas
        self.BS_rotation = 0 #
        self.BS_coordinate = np.array([0, 0, 30])
        self.BS_elevation_angle = 0 
        self.BS_max_power = 1000 # BS power constrain, W

        self.BS_normal_vecter = np.array([math.cos(self.BS_rotation), math.sin(self.BS_rotation), math.sin(self.BS_elevation_angle)*np.linalg.norm([-math.cos(self.BS_rotation), math.sin(self.BS_rotation)])])
        self.BS_normal_vecter = self.BS_normal_vecter / np.linalg.norm(self.BS_normal_vecter)

        # channel parameters class
        class channel_parameters():
            """
            channel parameters
            """
            def __init__(self):
                self.noise_dBm = noise_dBm = -114       # channel noise, -114 dBm
                self.noise_segma = 0                    # channel noise std,
                self.d_0 = 1                            # path loss referance distance 1 m
                self.rho_0 = 0.01                       # path loss parameter
                self.alpha_BS_to_IRS = 3                       # path loss exponent
                self.alpha_IRS_to_user = 2.5                      # path loss exponent
                self.alpha_BS_to_user = 3.5                      # path loss exponent
                self.K_BS = math.pow(10, 3/10)          # rician factor, 3dB
                self.K_IRS = math.pow(10, 3/10)         # rician factor, 3dB
        self.channel_parameters = channel_parameters()
        self.channel_parameters.noise_segma = math.pow(self.dBm_to_W(self.channel_parameters.noise_dBm),0.5)

        # IRS initial
        class IRS():
            """
            store user parameters
            """
            def __init__(self):
                self.IRS_N_x = IRS_N_x
                self.IRS_N_y = IRS_N_y
                self.IRS_N = IRS_N_x * IRS_N_y
                self.x_IRS = 1000                       # position x of IRS, 1000 m
                self.y_IRS = 0                          # position y of IRS, 0 m
                self.z_IRS = 25                          # position z of IRS, 0 m
                self.IRS_coordinate = np.array([self.x_IRS,self.y_IRS,self.z_IRS])
                self.deltaD_to_lambda = 0.5
                self.d_BS_to_IRS = np.linalg.norm(self.IRS_coordinate)
                self.theta_IRS = math.atan(self.y_IRS / self.x_IRS) # theta_IRS
                self.psi_IRS = 0                        # rotation of IRS , range: -pi/2 + theta_IRS  ~  pi/2 - theta_IRS
                self.elevation_angle = 0                # elevation angle

                self.IRS_normal_vecter = np.array([-math.cos(self.psi_IRS), math.sin(self.psi_IRS),math.sin(self.elevation_angle)*np.linalg.norm([-math.cos(self.psi_IRS), math.sin(self.psi_IRS)])])
                self.IRS_normal_vecter = self.IRS_normal_vecter / np.linalg.norm(self.IRS_normal_vecter)
                self.theta_BS_to_IRS = 0
                self.psi_BS_to_IRS = 0
                self.theta_IRS_to_BS = 0
                self.psi_IRS_to_BS = 0
                
        self.IRS = IRS()
        
        # users initial
        class user():
            """
            store user parameters
            """
            def __init__(self):
                self.user_index = 0                     # user ID
                self.distance_to_IRS = 10               # distance to IRS(need caculation), range of (10, 30)
                self.psi = 0                            # psi angle to IRS
                self.psi_IRS_surface = 0                # psi used to calculate the UPA response
                self.x_user = 10                        # position x 
                self.y_user = 10                        # position y
                self.z_user = 0                         # position z
                self.user_coordinate = np.array([self.x_user,self.y_user,self.z_user])
                self.distance_to_BS = np.linalg.norm(self.user_coordinate)                 # distance to BS
                self.theta_user = 0                     # theta to BS
                self.theta_IRS_to_kth_user = 0
                self.psi_IRS_to_kth_user = 0
                
        mini_distance = 10                                                 # minimal distance for users to IRS, m
        max_distance = 30                                                  # maxmial distance for users to IRS, m
        statistic_distance_list = [10, 12, 14, 17, 20, 23, 25, 29]
        statistic_angle_list = [0, math.pi/4, math.pi/2, 3*math.pi/4, math.pi, 5*math.pi/4, 3*math.pi/2, 7*math.pi/4]
        stocastic_distance_list = np.random.uniform(mini_distance, max_distance, self.K)    # list of dis
        stocastic_angle_list = np.random.uniform(- math.pi / 2, math.pi / 2, self.K)                      # list of angle
        self.user_list = []
        for index_user in range(K):
            user_temp = user()
            user_temp.user_index = index_user
            if statistic == True:
                user_temp.distance_to_IRS = statistic_distance_list[index_user]
                user_temp.psi = statistic_angle_list[index_user]
                user_temp.x_user = self.IRS.x_IRS - user_temp.distance_to_IRS * math.cos(user_temp.psi)
                user_temp.y_user = self.IRS.y_IRS + user_temp.distance_to_IRS * math.sin(user_temp.psi)
            else:
                user_temp.distance_to_IRS = stocastic_distance_list[index_user]
                user_temp.psi = stocastic_angle_list[index_user]
                user_temp.x_user = self.IRS.x_IRS - user_temp.distance_to_IRS * math.cos(user_temp.psi)
                user_temp.y_user = self.IRS.y_IRS + user_temp.distance_to_IRS * math.sin(user_temp.psi)
            user_temp.psi_IRS_surface = user_temp.psi - self.IRS.psi_IRS
            #user_temp.distance_to_BS = np.linalg.norm([user_temp.x_user, user_temp.y_user])
            user_temp.theta_user = math.atan(user_temp.y_user / user_temp.x_user )
            user_temp.user_coordinate = np.array([user_temp.x_user,user_temp.y_user,user_temp.z_user])
            self.user_list.append(user_temp)
            

        # transmit signal
        self.X_transmit = np.mat(np.ones((K, 1)))       # BS transmit signal, unit variance, shape: K X 1

        # active beamforming at BS
        self.G_beamforming = np.mat(np.ones((self.M, K),dtype=complex))    # BS beamforming matrix, shape: M X K
        self.BS_max_power = self.calculate_total_transmit_power()
        self.reset_G = np.mat(np.ones((self.M, K),dtype=complex))

        # channal gain from BS to IRS
        #self.H_BS_to_IRS = np.mat(np.ones((self.N, self.M)))      # channel gain BS to IRS, shape: N X M
        self.H_BS_to_IRS = self.calculate_channel_BS_to_IRS(self.IRS)

        # channal gain from IRS to users
        # self.H_IRS_to_user = np.mat(np.ones((K, self.N),dtype=complex))    # channel gain IRS to user, shape: K X N
        self.H_IRS_to_user = self.calculate_channel_IRS_to_user(self.IRS)

        # IRS phase shifter parameter
        self.Fai = np.mat(np.identity(self.N), dtype=complex)               # IRS phase shift 
        self.reset_Fai = np.mat(np.identity(self.N), dtype=complex)
        # data rate matrex container
        self.data_rate = np.mat(np.ones((K,1)))         # channal capacities for all K users

        # calculate sum of all user data rate
        self.sum_of_user_data_rates = self.calculate_data_rate()

    def calculate_total_transmit_power(self):
        """
        calculate total power
        """
        G = self.G_beamforming
        G_H = G.H
        temp_power = 0
        for m in range(self.M):
            temp_power += G[m,:] * G_H[:, m]
        return float(np.real(temp_power[0,0]))

    def dBm_to_W(self, dBm):
        mW = math.pow(10, dBm/10)
        W = mW/1000.0
        return W

    def generate_noise(self, mu, segma, size = 1):
        """
        generates noise of variance segma ^ 2
        size can be a tuple like (3,3)

        output :    a float of noise
                    or an array with size size of average mu and std of segma
        """
        result = np.random.normal(mu, segma, size)
        if size == 1:
            return float(result)
        else:
            return result

    def calculate_received_signal_amplitude(self):
        """
        calculate_received_signal_amplitude from BS to user

        output :    a matix , size: K X 1
        """   
        noise_array = self.generate_noise(0, self.channel_parameters.noise_segma, (self.K,1))
        noise_matrix = np.mat(noise_array)

        Y_matrix = self.H_IRS_to_user * self.Fai * self.H_BS_to_IRS * self.G_beamforming * self.X_transmit + noise_matrix
        return Y_matrix

    def calculate_SINR_of_kth_user(self, k):
        """
        calculate kth user's SINR

        output  :   a float
        """
        expected_amplitude = abs(self.H_IRS_to_user[k] * self.Fai * self.H_BS_to_IRS * self.G_beamforming[:, k])
        expected_power = math.pow(expected_amplitude, 2)

        noise_power = 0
        noise_power = noise_power + math.pow(self.channel_parameters.noise_segma, 2)
        for n in range(self.K):
            if n != k:
                noise_power = noise_power + math.pow(abs(self.H_IRS_to_user[n] * self.Fai * self.H_BS_to_IRS * self.G_beamforming[:, n]),2)
        return expected_power/noise_power

    def calculate_data_rate(self):
        """
        calculate K data rates

        output :    sum of datarate
        """
        for kth in range(self.K):
            self.data_rate[kth][0] = math.log2(1 + self.calculate_SINR_of_kth_user(kth))
        return float(self.data_rate.sum())

    def calculate_theta_and_psi(self, ref_norm_vector, coordinate):
        """
        calculate_theta_and_psi
        """
        theta_ref_to_coor =  math.acos(abs(np.dot(coordinate/ np.linalg.norm(coordinate),ref_norm_vector)))
        coordinate_xoy = coordinate - abs(math.cos(theta_ref_to_coor)) * np.linalg.norm(coordinate) * ref_norm_vector
        temp_x_vector = [ref_norm_vector[1], -ref_norm_vector[0], 0]
        psi_ref_to_coor = abs(np.dot(temp_x_vector, coordinate_xoy)) / (np.linalg.norm(temp_x_vector) * np.linalg.norm(coordinate_xoy))
        if math.isnan(psi_ref_to_coor):
            psi_ref_to_coor = 0
        
        return theta_ref_to_coor, psi_ref_to_coor
    def calculate_channel_BS_to_IRS(self, IRS): # IRS elements spacing half wavelength
        """
        denate the self.H_BS_to_IRS (N X M)
        """
        temp_vertical_matrix = np.zeros((IRS.IRS_N,1), dtype=complex)
        temp_horizontal_matrix = np.zeros((1, self.M), dtype=complex)
        """
        theta_BS_to_IRS = math.acos(np.dot(IRS.IRS_coordinate/ np.linalg.norm(IRS.IRS_coordinate),self.BS_normal_vecter))
        IRS_xoy = IRS.IRS_coordinate - abs(math.cos(theta_BS_to_IRS)) * np.linalg.norm(IRS.IRS_coordinate) * self.BS_normal_vecter
        temp_x_BS = [self.BS_normal_vecter[1], -self.BS_normal_vecter[0], 0]
        psi_BS_to_IRS = abs(np.dot(temp_x_BS, IRS_xoy)) / (np.linalg.norm(temp_x_BS) * np.linalg.norm(IRS_xoy))
        if math.isnan(psi_BS_to_IRS):
            psi_BS_to_IRS = 0
        
        theta_IRS_to_BS = math.acos(abs(np.dot(IRS.IRS_coordinate/ np.linalg.norm(IRS.IRS_coordinate), IRS.IRS_normal_vecter)))
        BS_xoy = -1 * np.array(IRS.IRS_coordinate) - abs(math.cos(theta_IRS_to_BS)) * np.linalg.norm(IRS.IRS_coordinate) * IRS.IRS_normal_vecter
        temp_x_IRS = [IRS.IRS_normal_vecter[1], -IRS.IRS_normal_vecter[0], 0]
        psi_IRS_to_BS = abs(np.dot(temp_x_IRS, BS_xoy)) / (np.linalg.norm(temp_x_IRS) * np.linalg.norm(BS_xoy))
        if math.isnan(psi_IRS_to_BS):
            psi_IRS_to_BS = 0
        # old way to calculate theta and psi
        """
        theta_BS_to_IRS, psi_BS_to_IRS = self.calculate_theta_and_psi(self.BS_normal_vecter, IRS.IRS_coordinate - self.BS_coordinate)
        theta_IRS_to_BS, psi_IRS_to_BS = self.calculate_theta_and_psi(IRS.IRS_normal_vecter, self.BS_coordinate - IRS.IRS_coordinate)
        IRS.theta_BS_to_IRS = theta_BS_to_IRS
        IRS.psi_BS_to_IRS = psi_BS_to_IRS
        IRS.theta_IRS_to_BS = theta_IRS_to_BS
        IRS.psi_IRS_to_BS = psi_IRS_to_BS

        for i in range(IRS.IRS_N_y):
            for j in range(IRS.IRS_N_x):
                temp_vertical_matrix[i * IRS.IRS_N_x + j][0] = \
                cmath.exp(-1j * 2 * math.pi * IRS.deltaD_to_lambda * \
                (i * math.sin(theta_IRS_to_BS)* math.cos(psi_IRS_to_BS)+ \
                j * math.sin(theta_IRS_to_BS)* math.sin(psi_IRS_to_BS)) \
                )
        for i in range(self.M):
            temp_horizontal_matrix[0][i] = \
            cmath.exp(-1j * 2 * math.pi * IRS.deltaD_to_lambda * \
            (i * math.sin(theta_BS_to_IRS)* math.cos(psi_BS_to_IRS)) \
            )

        temp_horizontal_matrix = np.mat(temp_horizontal_matrix)
        temp_vertical_matrix = np.mat(temp_vertical_matrix)
        H_LOS = temp_vertical_matrix * temp_horizontal_matrix

        H = math.pow(self.channel_parameters.rho_0* (math.pow(IRS.d_BS_to_IRS/self.channel_parameters.d_0, - self.channel_parameters.alpha_BS_to_IRS)), 0.5) \
        * (H_LOS)
        return H
        
    def calculate_channel_IRS_to_user(self, IRS):
        """
        channel gain IRS to user, shape: K X N
        """
        temp_array = np.zeros((self.K, self.N), dtype=complex)

        for kth in range(self.K):
            theta_IRS_to_kth_user, psi_IRS_to_kth_user = self.calculate_theta_and_psi(IRS.IRS_normal_vecter, self.user_list[kth].user_coordinate - IRS.IRS_coordinate)
            self.user_list[kth].theta_IRS_to_kth_user = theta_IRS_to_kth_user
            self.user_list[kth].psi_IRS_to_kth_user = psi_IRS_to_kth_user
            for i in range(IRS.IRS_N_y):
                for j in range(IRS.IRS_N_x):
                    temp_array[kth][i * IRS.IRS_N_x + j] = \
                    cmath.exp(-1j * 2 * math.pi * IRS.deltaD_to_lambda * \
                    (i * math.sin(theta_IRS_to_kth_user)* math.cos(psi_IRS_to_kth_user)+ \
                    j * math.sin(theta_IRS_to_kth_user)* math.sin(psi_IRS_to_kth_user)) \
                    )
        H_LOS = temp_array
        for kth in range(self.K):
            H_LOS[kth] = math.pow(self.channel_parameters.rho_0* (math.pow(np.linalg.norm(IRS.IRS_coordinate - self.user_list[kth].user_coordinate)/self.channel_parameters.d_0, - self.channel_parameters.alpha_BS_to_IRS)), 0.5) \
            * (H_LOS[kth])
        
        H = np.mat(H_LOS)
        return H

    def reset(self):
        """
        RL implement, 
        """
        # 1 reset all parameters
        self.G_beamforming = self.reset_G
        self.Fai = self.reset_Fai
        # 2 get state
        return self.get_state()

    def apply_action(self, action):
        """
        apply action
        """
        # 0 parameters and all kinds of matrix
        K = self.K
        M = self.M
        N = self.N
        # 1 apply action
        # 1.1 divide action list into two parts
        # 1.1.1 beamforming part, MXK
        action_beamforming = action[0 : 2*M*K]
        # 1.1.2 IRS reflecting coinfetions
        action_Fai = action[2*M*K : 2*M*K + 2*N]

        # 1.2 apply action
        # 1.2.1 beamforming part, MXK
        for m in range(M):
            for k in range(K):
                index = m * K + k
                temp_Gmk = action_beamforming[2*index] + 1j * action_beamforming[2*index + 1]
                self.G_beamforming[m, k] = temp_Gmk
        # 1.2.2 IRS reflecting coinfetions
        for n in range(N):
            temp_Fainn = action_Fai[2*n] + 1j * action_Fai[2*n + 1]
            self.Fai[n, n] = temp_Fainn

        return True
    def get_state(self):
        """
        get current state
        """
        # 0 parameters and all kinds of matrix
        K = self.K
        M = self.M
        N = self.N
        new_state_dims = 2*K + 2*K**2 + 2*N + 2*M*K + 2*N*M + 2*K*N
        
        # 1 get new state
        new_state = []
        G = self.G_beamforming
        G_H = self.G_beamforming.H
        H_IRS_to_user = self.H_IRS_to_user
        Fai = self.Fai
        H_BS_to_IRS = self.H_BS_to_IRS
        # 1.1 transmit power to k users, 2K
        state_tran_power_BS_to_users = [] # result container
        for kth in range(K):
            state_tran_power_BS_to_users.append(\
            math.pow(float(np.real(G_H[kth,:] * G[:,kth])),2))
            state_tran_power_BS_to_users.append(\
            math.pow(float(np.imag(G_H[kth,:] * G[:,kth])),2))
        
        # 1.2 received power for users, 2K^2
        state_action_receive_power_for_users = []
        for k1 in range(K):
            user_k1_received_power_from_all_other_users = \
            H_IRS_to_user[k1, :] * Fai * H_BS_to_IRS * G
            for k2 in range(K):
                state_action_receive_power_for_users.append(\
                math.pow(np.real(user_k1_received_power_from_all_other_users[0, k2]), 2))
                state_action_receive_power_for_users.append(\
                math.pow(np.imag(user_k1_received_power_from_all_other_users[0, k2]), 2))
                
        # 1.3 beamforming at BS, 2*M*K+2*N
        state_beamforming_at_BS = []
        for m in range(M):
            for k in range(K):
                state_beamforming_at_BS.append(\
                math.pow(np.real(G[m, k]), 2))
                state_beamforming_at_BS.append(\
                math.pow(np.imag(G[m, k]), 2))
        for n in range(N):
            state_beamforming_at_BS.append(\
            math.pow(np.real(Fai[n, n]), 2))
            state_beamforming_at_BS.append(\
            math.pow(np.imag(Fai[n, n]), 2))

        # 1.4 channel state at both BS-IRS & IRS-users, 2*N*M+2*K*M
        state_channel_BS_IRS_and_IRS_users = []
        for n in range(N):
            for m in range(M):
                state_channel_BS_IRS_and_IRS_users.append(\
                math.pow(np.real(H_BS_to_IRS[n, m]), 2))
                state_channel_BS_IRS_and_IRS_users.append(\
                math.pow(np.imag(H_BS_to_IRS[n, m]), 2))
        for k in range(K):
            for n in range(N):
                state_channel_BS_IRS_and_IRS_users.append(\
                math.pow(np.real(H_IRS_to_user[k, n]), 2))
                state_channel_BS_IRS_and_IRS_users.append(\
                math.pow(np.imag(H_IRS_to_user[k, n]), 2))
        # 1.5 finnish getting state
        new_state.append(state_tran_power_BS_to_users)
        new_state.append(state_action_receive_power_for_users)
        new_state.append(state_beamforming_at_BS)
        new_state.append(state_channel_BS_IRS_and_IRS_users)

        result = state_tran_power_BS_to_users + \
        state_action_receive_power_for_users + \
        state_beamforming_at_BS + \
        state_channel_BS_IRS_and_IRS_users
        return result
        

    def step(self, action):
        """
        RL function ,state is a 1 X (2K+2K^2+2N+2MK+2NM+2KN) vecter
        """
        done = False
        # 1 get new state
        new_state = self.get_state()
        # 2 apply action 
        self.apply_action(action)
        # 3 judge if done
        total_power = self.calculate_total_transmit_power()
        if total_power > self.BS_max_power:
            done = True
        # 4 get other reward
        reward = 0
        if total_power > self.BS_max_power:
            reward = np.real(self.BS_max_power - total_power)
            # reward = np.real(self.BS_max_power[0,0] - total_power[0,0] )/1000
        else:
            reward = self.calculate_data_rate()*100

        info = 0
        return new_state, reward, done, info

    def render(self):
        """
        show system function
        """


        return True
     

================================================
FILE: load_and_plot.py
================================================
import matplotlib.pyplot as plt
import numpy as np
import cmath
from scipy.io import loadmat, savemat
import pandas as pd
import os
import copy
import math
import csv

import argparse

# get argument from user
parser = argparse.ArgumentParser()
parser.add_argument('--path', type = str, required = False, default=None, help='the path where the training/simulation data is stored')
parser.add_argument('--ep-num', type = int, required = False, default=300, help='total number of episodes')


# extract argument
args = parser.parse_args()
STORE_PATH = args.path
EP_NUM = args.ep_num


######################################################
# new for energy 
# energy related parameters of rotary-wing UAV
# based on Energy Minimization in Internet-of-Things System Based on Rotary-Wing UAV
P_i = 790.6715
P_0 = 580.65
U2_tip = (200) ** 2
s = 0.05
d_0 = 0.3
p = 1.225
A = 0.79
delta_time = 0.1 #0.1/1000 #0.1ms

# add ons hover veloctiy
# based on https://www.intechopen.com/chapters/57483
m = 1.3 # mass: assume 1.3kg https://www.droneblog.com/average-weights-of-common-types-of-drones/#:~:text=In%20most%20cases%2C%20toy%20drones,What%20is%20this%3F
g = 9.81 # gravity
T = m * g # thrust
v_0 = (T / (A * 2 * p)) ** 0.5

def get_energy_consumption(v_t):
    '''
    arg
    1) v_t = displacement per time slot
    '''
    energy_1 = P_0 \
                + 3 * P_0 * (abs(v_t)) ** 2 / U2_tip \
                + 0.5 * d_0 * p * s * A * (abs(v_t))**3
    
    energy_2 = P_i * ((
                    (1 + (abs(v_t) ** 4) / (4 * (v_0 ** 4))) ** 0.5 \
                    - (abs(v_t) ** 2) / (2 * (v_0 **2)) \
                ) ** 0.5)
    
    energy = delta_time * (energy_1 + energy_2)
    return energy 

ENERGY_MIN = get_energy_consumption(0.25)
ENERGY_MAX = get_energy_consumption(0)

######################################################


# modified from data_manager.py
init_data_file = 'data/init_location.xlsx'
def read_init_location(entity_type = 'user', index = 0):
    if entity_type == 'user' or 'attacker' or 'RIS' or 'RIS_norm_vec' or 'UAV':
        return np.array([\
        pd.read_excel(init_data_file, sheet_name=entity_type)['x'][index],\
        pd.read_excel(init_data_file, sheet_name=entity_type)['y'][index],\
        pd.read_excel(init_data_file, sheet_name=entity_type)['z'][index]])
    else:
        return None


# load and plot everything
class LoadAndPlot(object):
    """
    load date and plot 2022-07-22 16_16_26
    """
    def __init__(self, store_path, \
                       user_num = 2, attacker_num = 1, RIS_ant_num = 4, \
                       ep_num = EP_NUM, step_num = 100): # RIS_ant_num = 16 (not true)

        self.color_list = ['b', 'c', 'g', 'k', 'm', 'r', 'y']
        self.store_path = store_path + '//'
        self.user_num = user_num
        self.attacker_num = attacker_num
        self.RIS_ant_num = RIS_ant_num
        self.ep_num = ep_num
        self.step_num = step_num

        self.all_steps = self.load_all_steps()


    def load_one_ep(self, file_name):
        m = loadmat(self.store_path + file_name)
        return m


    def load_all_steps(self):
        result_dic = {}
        result_dic.update({'reward':[]})

        result_dic.update({'user_capacity':[]})
        for i in range(self.user_num):
            result_dic['user_capacity'].append([])

        result_dic.update({'secure_capacity':[]})
        for i in range(self.user_num):
            result_dic['secure_capacity'].append([])

        result_dic.update({'attaker_capacity':[]})
        for i in range(self.attacker_num):
            result_dic['attaker_capacity'].append([])
        
        result_dic.update({'RIS_elements':[]})
        for i in range(self.RIS_ant_num):
            result_dic['RIS_elements'].append([])

        for ep_cnt in range(self.ep_num):
            mat_ep = self.load_one_ep("simulation_result_ep_" + str(ep_cnt) + ".mat")

            one_ep_reward = mat_ep["result_" + str(ep_cnt)]["reward"][0][0]
            result_dic['reward'] += list(one_ep_reward[:, 0])

            one_ep_user_capacity = mat_ep["result_" + str(ep_cnt)]["user_capacity"][0][0]
            for i in range(self.user_num):
                result_dic['user_capacity'][i] += list(one_ep_user_capacity[:, i])
            
            one_ep_secure_capacity = mat_ep["result_" + str(ep_cnt)]["secure_capacity"][0][0]
            for i in range(self.user_num):
                result_dic['secure_capacity'][i] += list(one_ep_secure_capacity[:, i])
            
            one_ep_attaker_capacity = mat_ep["result_" + str(ep_cnt)]["attaker_capacity"][0][0]
            for i in range(self.attacker_num):
                result_dic['attaker_capacity'][i] += list(one_ep_attaker_capacity[:, i])

            one_ep_RIS_first_element = mat_ep["result_" + str(ep_cnt)]["reflecting_coefficient"][0][0]
            for i in range(self.RIS_ant_num):
                result_dic['RIS_elements'][i] += list(one_ep_RIS_first_element[:, i])

        return result_dic


    def plot(self):
        """
        plot result
        b--blue c--cyan(青色） g--green k--black m--magenta（紫红色） r--red w--white y--yellow 
        """
        if not os.path.exists(self.store_path + 'plot'):
            os.makedirs(self.store_path + 'plot')
            os.makedirs(self.store_path + 'plot/RIS')

        color_list = ['b', 'g', 'c', 'k', 'm', 'r', 'y']
        

        ###############################
        # read step counts per episode
        ###############################
        step_num_per_episode = []
        with open(self.store_path + 'step_num_per_episode.csv', newline='') as csvfile:
            reader = csv.reader(csvfile)
            for row in reader:
                step_num_per_episode.append(int(row[0]))
        
        ###############################
        # plot reward
        ###############################
        fig = plt.figure('reward')
        plt.plot(range(len(self.all_steps['reward'])), self.all_steps['reward'])
        plt.xlabel("Time Steps ($t$)")
        plt.ylabel("Reward")
        plt.savefig(self.store_path + 'plot/reward.png')
        plt.cla()
        
        
        ###############################
        # plot secure capacity
        ###############################
        fig = plt.figure('secure_capacity')
        for i in range(self.user_num):
            plt.plot(range(len(self.all_steps['secure_capacity'][i])), self.all_steps['secure_capacity'][i], c=color_list[i])
        plt.legend(['user_' + str(i) for i in range(self.user_num)])
        plt.xlabel("Time Steps ($t$)")
        plt.ylabel("Secure Capacity")
        plt.savefig(self.store_path + 'plot/secure_capacity.png')
        plt.cla()

        
        ###############################
        # plot average sum secrecy rate of each episode
        ###############################
        fig = plt.figure('average_sum_secrecy_rate')
        sum_secrecy_rate = np.array(self.all_steps['secure_capacity'])
        sum_secrecy_rate = np.sum(sum_secrecy_rate, axis = 0)
        average_sum_secrecy_rate = []
        ssr = []
        j = 0
        for i in range(self.ep_num):
            ssr_one_episode = sum_secrecy_rate[j:j+step_num_per_episode[i]] # ssr means Sum Secrecy Rate
            #print(j, j+step_num_per_episode[i])
            j = j+step_num_per_episode[i]
            ssr.append(ssr_one_episode)
            try:
                _ = sum(ssr_one_episode) / len(ssr_one_episode)
            except:
                _ = 0
            average_sum_secrecy_rate.append(_)
        plt.plot(range(len(average_sum_secrecy_rate)), average_sum_secrecy_rate)
        plt.xlabel("Episodes (Ep)")
        plt.ylabel("Average Sum Secrecy Rate")
        plt.savefig(self.store_path + 'plot/average_sum_secrecy_rate.png')
        plt.cla()

        print()
        print('###########################################################')
        print('Metrics\t\t\tLast Episode\tMax Values Reached')
        print('###########################################################')
        print('SSR (bits/s/Hz)\t\t{:.2f}\t\t{:.2f}'.format(average_sum_secrecy_rate[-1], max(average_sum_secrecy_rate)))
        

        ###############################
        # plot secrecy energy efficient
        ###############################
        fig = plt.figure('average_secrecy_energy_efficiency')

        # get init location
        init_uav_coord = read_init_location(entity_type = 'UAV')
        init_user_coord_0 = read_init_location(entity_type = 'user', index=0)
        init_user_coord_1 = read_init_location(entity_type = 'user', index=1)
        
        ep_num = EP_NUM
        energies = []
        for i in range(ep_num):
            # read the mat file
            filename = f'simulation_result_ep_{i}.mat'
            filename = os.path.join(STORE_PATH, filename)
            data = loadmat(filename)
        
            # v_ts
            energies_one_episode = []
        
            # loop all uav movt
            uav_movt = data[f'result_{i}'][0][0][-1]
            for j in range(uav_movt.shape[0]):
                move_x = uav_movt[j][0]
                move_y = uav_movt[j][1]
                v_t = (move_x ** 2 + move_y ** 2) ** 0.5
                energy = get_energy_consumption(v_t / delta_time)
                energies_one_episode.append(energy)
            energies.append(energies_one_episode)
                
        average_see = []
        for ssr_one_episode, energies_one_episode in zip(ssr, energies):
            ssr_one_episode = ssr_one_episode[:len(energies_one_episode)]
            energies_one_episode = energies_one_episode[:len(ssr_one_episode)]
            #print(len(ssr_one_episode), len(energies_one_episode))
            
            try:
                see = np.array(ssr_one_episode) / np.array(energies_one_episode)
                average_see.append(sum(see)/len(see))
            except:
                average_see.append(0)

        
        plt.plot(range(len(average_see)), average_see)
        plt.xlabel("Episodes (Ep)")
        plt.ylabel("Average Secrecy Energy Efficiency")
        plt.savefig(self.store_path + 'plot/average_secrecy_energy_efficiency.png')
        plt.cla() 
        
        print('Energy (kJ)\t\t{:.2f}\t\t{:.2f}'.format(sum(energies[-1])/1000, sum(energies[np.argmax(average_see)])/1000))
        print('SEE (bits/s/Hz/kJ)\t{:.2f}\t\t{:.2f}'.format(average_see[-1]*1000, max(average_see)*1000))
        print('\nThe final performance is evalulated based on the Last Episode (where exploration=0)\n')
        
        
        ###############################
        # plot user capacity
        ###############################
        fig = plt.figure('user_capacity')
        for i in range(self.user_num):
            plt.plot(range(len(self.all_steps['user_capacity'][i])), self.all_steps['user_capacity'][i], c=color_list[i])
        plt.legend(['user_' + str(i) for i in range(self.user_num)])
        plt.xlabel("Time Steps ($t$)")
        plt.ylabel("User Capacity")
        plt.savefig(self.store_path + 'plot/user_capacity.png')
        plt.cla()

        
        ###############################
        # plot attacker capacity
        ###############################
        fig = plt.figure('attaker_capacity')
        for i in range(self.attacker_num):
            plt.plot(range(len(self.all_steps['attaker_capacity'][i])), self.all_steps['attaker_capacity'][i], c=color_list[i])
        plt.legend(['attacker_' + str(i) for i in range(self.attacker_num)])
        plt.xlabel("Time Steps ($t$)")
        plt.ylabel("Attack Capacity")
        plt.savefig(self.store_path + 'plot/attaker_capacity.png')
        plt.close('all')
        
        
        ###############################
        # plot ris
        ###############################
        for i in range(self.RIS_ant_num):
            self.plot_one_RIS_element(i)
            
        
        ###############################
        # plot trajectory
        ###############################
        self.plot_trajectory()

    
    def plot_one_RIS_element(self, index):
        """
        docstring
        """
        ax_real_imag = plt.subplot(1,1,1)
        ax_pase = ax_real_imag.twinx()
        #plt.ylim(ymax = 1, ymin = -1)
        #plt.xlim(xmax = 10000 , xmin = 10000 - 100)
        ax_real_imag.plot(range(len(self.all_steps['RIS_elements'][index])), np.real(self.all_steps['RIS_elements'][index]), c = self.color_list[0])
        ax_real_imag.plot(range(len(self.all_steps['RIS_elements'][index])), np.imag(self.all_steps['RIS_elements'][index]), c = self.color_list[1])
        phase_list = []
        for complex_num in self.all_steps['RIS_elements'][index]:
            phase_list.append(cmath.phase(complex_num))
        plt.ylim(ymax = cmath.pi, ymin = -cmath.pi)
        ax_pase.plot(range(len(self.all_steps['RIS_elements'][index])), phase_list, c = self.color_list[2])
#        plt.xlabel("Time Steps ($t$)")
#        plt.ylabel("RIS Dimension")
        # plt.set_ylabel("position")
        # plt.set_ylabel("position")
        # plt.set_xlabel("Time Steps ($t$)")
        plt.savefig(self.store_path + 'plot/RIS/RIS_' + str(index) + '_element.png')
        plt.close('all')
        pass

        
    def plot_trajectory(self):
        # get init location
        init_uav_coord = read_init_location(entity_type = 'UAV')
        init_user_coord_0 = read_init_location(entity_type = 'user', index=0)
        init_user_coord_1 = read_init_location(entity_type = 'user', index=1)
        
        ep_num = EP_NUM
        interval = int(0.2 * EP_NUM)
        ep_list = [0] + [i for i in range(20-1, ep_num, interval)]
        if EP_NUM - 1 not in ep_list: ep_list.append(EP_NUM - 1)
        color_list_template = ['b', 'g', 'c', 'k', 'm', 'r', 'y', 'black', 'red']
        
        
        color_list = copy.deepcopy(color_list_template)
        for i in ep_list:
            # read the mat file
            filename = f'simulation_result_ep_{i}.mat'
            filename = os.path.join(STORE_PATH, filename)
            data = loadmat(filename)
        
            # uav movt
            uav_coord = [ [init_uav_coord[0]], [init_uav_coord[1]] ]
        
            uav_movt = data[f'result_{i}'][0][0][-1]
            for j in range(uav_movt.shape[0]):
                move_x = uav_movt[j][0]
                move_y = uav_movt[j][1]
        
                prev_x = uav_coord[0][-1]
                prev_y = uav_coord[1][-1]
        
                current_x = prev_x + move_x
                current_y = prev_y + move_y
        
                uav_coord[0].append(current_x)
                uav_coord[1].append(current_y)
            plt.plot(uav_coord[1],uav_coord[0], c=color_list.pop(0))
        
        # user 0 movt
        direction_fai = -1/2*math.pi 
        distance_delta_d = 0.2
        user_coord_0 = [ [init_user_coord_0[0]], [init_user_coord_0[1]] ]
        #color_list = copy.deepcopy(color_list_template)
        for j in range(uav_movt.shape[0]):
            delta_x = distance_delta_d * math.cos(direction_fai)
            delta_y = distance_delta_d * math.sin(direction_fai)
        
            prev_x = user_coord_0[0][-1]
            prev_y = user_coord_0[1][-1]
        
            current_x = prev_x + delta_x
            current_y = prev_y + delta_y
        
            user_coord_0[0].append(current_x)
            user_coord_0[1].append(current_y)
        plt.plot(user_coord_0[1],user_coord_0[0], c=color_list.pop(0))
        plt.plot(user_coord_0[1][0], user_coord_0[0][0], marker="o", markersize=10, markeredgecolor="red", markerfacecolor="red")
        
        
        # user 1 movt
        direction_fai = -1/2*math.pi 
        distance_delta_d = 0.2
        user_coord_0 = [ [init_user_coord_1[0]], [init_user_coord_1[1]] ]
        #color_list = copy.deepcopy(color_list_template)
        for j in range(uav_movt.shape[0]):
            delta_x = distance_delta_d * math.cos(direction_fai)
            delta_y = distance_delta_d * math.sin(direction_fai)
        
            prev_x = user_coord_0[0][-1]
            prev_y = user_coord_0[1][-1]
        
            current_x = prev_x + delta_x
            current_y = prev_y + delta_y
        
            user_coord_0[0].append(current_x)
            user_coord_0[1].append(current_y)
        plt.plot(user_coord_0[1],user_coord_0[0], c=color_list.pop(0))
        plt.plot(user_coord_0[1][0], user_coord_0[0][0], marker="o", markersize=10, markeredgecolor="red", markerfacecolor="red")
        
        plt.legend(ep_list)
        plt.grid()
        plt.xlim(0, 50)
        plt.ylim(-10, 30)
        plt.gca().invert_yaxis()
        plt.savefig(self.store_path + 'plot/trajectory.png')
        plt.cla()


    def restruct(self):
        savemat(self.store_path + 'all_steps.mat',self.all_steps)
        return 0
if __name__ == '__main__':
    LoadPlotObject = LoadAndPlot(
        store_path = STORE_PATH,
        )
    LoadPlotObject.plot()
    LoadPlotObject.restruct()

    



================================================
FILE: main_RIS.py
================================================
from env import *
episode_num= 100
step_num = 100
a= int(5/5 * episode_num * step_num)
print(a)

================================================
FILE: main_ref.py
================================================
from ddpg import Agent
from env1 import minimal_IRS_system # orginal
from env import MiniSystem
import numpy as np
#from utils import plotLearning
import matplotlib.pyplot as plt
import os
from data_manager import DataManager
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"

IRS_system = minimal_IRS_system(K = 1)
#IRS_system = MiniSystem()
K = IRS_system.K
M = IRS_system.M
N = IRS_system.N
RL_state_dims = 2*K + 2*K**2 + 2*N + 2*M*K + 2*N*M + 2*K*N
RL_input_dims = RL_state_dims
RL_action_dims = 2 * (M * K) + N

steps_per_ep = 200
alpha_actor_learning_rate = 0.001
beta_critic_learning_rate = 0.001
agent = Agent(alpha=alpha_actor_learning_rate, beta=beta_critic_learning_rate, input_dims=[RL_input_dims], tau=0.001, env=IRS_system,
              batch_size=64,  layer1_size=400 * 2, layer2_size=300 * 2, n_actions=RL_action_dims)

scores = []
for i in range(1000):
    observersion = IRS_system.reset()
    done = False
    done_sys = False
    score = 0
    cnt_in_one_epi = 0
    best_bit_per_Hz = 0
    draw_bit_rate_list = []
    draw_tran_power_list = []
    #draw_bit_rate_one_element = {'if_exceed_max_power':False,'bit_rate' : 0}
    while not done:
        cnt_in_one_epi += 1
        if cnt_in_one_epi > 500:
            done = True
        action = agent.choose_action(observersion)
        new_state, reward, done_sys , info = IRS_system.step(action)

        bit_per_Hz = IRS_system.calculate_data_rate()   
        #draw_bit_rate_one_element['bit_rate'] = bit_per_Hz 
        #draw_bit_rate_one_element['if_exceed_max_power']=done_sys
        draw_bit_rate_list.append(bit_per_Hz)

        total_power = IRS_system.calculate_total_transmit_power()
        draw_tran_power_list.append(total_power)
        if done_sys == False:# if not exceed max transmit power            
            if bit_per_Hz > best_bit_per_Hz:
                best_bit_per_Hz = bit_per_Hz
        agent.remember(observersion, action, reward, new_state, int(done))
        agent.learn()
        score += reward
        observersion = new_state
        IRS_system.render()
    plt.cla()
    plt.plot(range(len(draw_bit_rate_list)), draw_bit_rate_list, color = 'green')
    plt.plot(range(len(draw_tran_power_list)), draw_tran_power_list, color = 'red')

    # plt.show()
    filename_i =os.path.abspath(os.curdir) + '\\main_foder\\image_result\\' + str(i) + '.png'
    plt.savefig(filename_i)
    scores.append(score)
    #if i % 25 == 0:
        #agent.save_models()

    print('episode ', i, 'score %.2f' % score, 'best sum rate %.3f bit/s/Hz' % best_bit_per_Hz,
          'trailing 100 games avg %.4f' % np.mean(scores[-100:]))
filename = 'C:\\demo\\IRS_DDPG_minimal\\main_foder\\LunarLander-alpha000025-beta00025-400-300.png'
# plotLearning(scores, filename, window=100)


================================================
FILE: main_train.py
================================================
# debug field
import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"

import argparse

# get argument from user
parser = argparse.ArgumentParser()
parser.add_argument('--drl', type = str, required = True, default='td3', help="which drl algo would you like to choose ['ddpg', 'td3']")
parser.add_argument('--reward', type = str, required = True, default='see', help="which reward would you like to implement ['ssr', 'see']")
parser.add_argument('--seeds', type = int, required = False, default=None,  nargs='+', help="what seed(s) would you like to use for DRL 1 and 2, please provide in one or two int")
parser.add_argument('--ep-num', type = int, required = False, default=300, help="how many episodes do you want to train your DRL")
parser.add_argument('--trained-uav', default=False, action='store_true', help='use trained uav instead of retraining')

args = parser.parse_args()
DRL_ALGO = args.drl
REWARD_DESIGN = args.reward
SEEDS = args.seeds
EPISODE_NUM = args.ep_num
TRAINED_UAV = args.trained_uav

# process the argument
assert DRL_ALGO in ['ddpg', 'td3'], "drl must be ['ddpg', 'td3']"
assert REWARD_DESIGN in ['ssr', 'see'], "reward must be ['ssr', 'see']"
if SEEDS is not None:
    assert len(SEEDS) in [1, 2] and isinstance(SEEDS[0], int) and isinstance(SEEDS[-1], int), "seeds must be a list of 1 or 2 integer"

if DRL_ALGO == 'td3':
    from td3 import Agent
elif DRL_ALGO == 'ddpg':
    from ddpg import Agent
import ddpg

from env import MiniSystem
import numpy as np
import math
import time
import torch

# 1 init system model
episode_num = EPISODE_NUM # recommend to be 300
episode_cnt = 0
step_num = 100

project_name = f'trained_uav/{DRL_ALGO}_{REWARD_DESIGN}' if TRAINED_UAV else f'scratch/{DRL_ALGO}_{REWARD_DESIGN}'

system = MiniSystem(
    user_num=2,
    RIS_ant_num=4,
    UAV_ant_num=4,
    if_dir_link=1,
    if_with_RIS=True,
    if_move_users=True,
    if_movements=True,
    reverse_x_y=(False, False),
    if_UAV_pos_state = True,
    reward_design = REWARD_DESIGN,
    project_name = project_name,
    step_num = step_num
    )

if_Theta_fixed = False
if_G_fixed = False
if_BS = False
if_robust = True


# 2 init RL Agent
agent_1_param_dic = {}
agent_1_param_dic["alpha"] = 0.0001
agent_1_param_dic["beta"] = 0.001
agent_1_param_dic["input_dims"] = system.get_system_state_dim()
agent_1_param_dic["tau"] = 0.001
agent_1_param_dic["batch_size"] = 64
agent_1_param_dic["n_actions"] = system.get_system_action_dim() - 2
agent_1_param_dic["action_noise_factor"] = 0.1
agent_1_param_dic["memory_max_size"] = int(5/5 * episode_num * step_num) #/2
agent_1_param_dic["agent_name"] = "G_and_Phi"
agent_1_param_dic["layer1_size"] = 800
agent_1_param_dic["layer2_size"] = 600
agent_1_param_dic["layer3_size"] = 512
agent_1_param_dic["layer4_size"] = 256

agent_2_param_dic = {}
agent_2_param_dic["alpha"] = 0.0001
agent_2_param_dic["beta"] = 0.001
agent_2_param_dic["input_dims"] = 3
agent_2_param_dic["tau"] = 0.001
agent_2_param_dic["batch_size"] = 64
agent_2_param_dic["n_actions"] = 2
agent_2_param_dic["action_noise_factor"] = 0.5
agent_2_param_dic["memory_max_size"] = int(5/5 * episode_num * step_num) #/2
agent_2_param_dic["agent_name"] = "UAV"
agent_2_param_dic["layer1_size"] = 400
agent_2_param_dic["layer2_size"] = 300
agent_2_param_dic["layer3_size"] = 256
agent_2_param_dic["layer4_size"] = 128

if SEEDS is not None:
    torch.manual_seed(SEEDS[0]) # 1
    torch.cuda.manual_seed_all(SEEDS[0]) # 1
agent_1 = Agent(
    alpha       = agent_1_param_dic["alpha"],
    beta        = agent_1_param_dic["beta"],
    input_dims  = [agent_1_param_dic["input_dims"]],
    tau         = agent_1_param_dic["tau"],
    env         = system,
    batch_size  = agent_1_param_dic["batch_size"],
    layer1_size=agent_1_param_dic["layer1_size"],
    layer2_size=agent_1_param_dic["layer2_size"], 
    layer3_size=agent_1_param_dic["layer3_size"],
    layer4_size=agent_1_param_dic["layer4_size"],
    n_actions   = agent_1_param_dic["n_actions"],
    max_size = agent_1_param_dic["memory_max_size"],
    agent_name= agent_1_param_dic["agent_name"]
    ) 

if SEEDS is not None:
    torch.manual_seed(SEEDS[-1]) # 2
    torch.cuda.manual_seed_all(SEEDS[-1]) # 2
agent_2 = Agent(
    alpha       = agent_2_param_dic["alpha"],
    beta        = agent_2_param_dic["beta"],
    input_dims  = [agent_2_param_dic["input_dims"]],
    tau         = agent_2_param_dic["tau"],
    env         = system,
    batch_size  = agent_2_param_dic["batch_size"],
    layer1_size=agent_2_param_dic["layer1_size"],
    layer2_size=agent_2_param_dic["layer2_size"], 
    layer3_size=agent_2_param_dic["layer3_size"],
    layer4_size=agent_2_param_dic["layer4_size"],
    n_actions   = agent_2_param_dic["n_actions"],
    max_size = agent_2_param_dic["memory_max_size"],
    agent_name= agent_2_param_dic["agent_name"]
    ) 


if TRAINED_UAV:
    benchmark = f'data/storage/benchmark/{DRL_ALGO}_{REWARD_DESIGN}_benchmark'
    if DRL_ALGO == 'td3':
        agent_2.load_models(
             load_file_actor = benchmark + '/Actor_UAV_TD3',
             load_file_critic_1 = benchmark + '/Critic_1_UAV_TD3',
             load_file_critic_2 = benchmark + '/Critic_2_UAV_TD3'
             )
    elif DRL_ALGO == 'ddpg':
        agent_2.load_models(
             load_file_actor = benchmark + '/Actor_UAV_ddpg',
             load_file_critic = benchmark + '/Critic_UAV_ddpg'
             )

meta_dic = {}
print("***********************system information******************************")
print("folder_name:     "+str(system.data_manager.store_path))
meta_dic['folder_name'] = system.data_manager.store_path
print("user_num:        "+str(system.user_num))
meta_dic['user_num'] = system.user_num
print("if_dir:          "+str(system.if_dir_link))
meta_dic['if_dir_link'] = system.if_dir_link
print("if_with_RIS:     "+str(system.if_with_RIS))
meta_dic['if_with_RIS'] = system.if_with_RIS
print("if_user_m:       "+str(system.if_move_users))
meta_dic['if_move_users'] = system.if_move_users
print("RIS_ant_num:     "+str(system.RIS.ant_num))
meta_dic['system_RIS_ant_num'] = system.RIS.ant_num
print("UAV_ant_num:     "+str(system.UAV.ant_num))
meta_dic['system_UAV_ant_num'] = system.UAV.ant_num
print("if_movements:    "+str(system.if_movements))
meta_dic['system_if_movements'] = system.if_movements
print("reverse_x_y:     "+str(system.reverse_x_y))
meta_dic['system_reverse_x_y'] = system.reverse_x_y
print("if_UAV_pos_state:"+str(system.if_UAV_pos_state))
meta_dic['if_UAV_pos_state'] = system.if_UAV_pos_state

print("ep_num:          "+str(episode_num))
meta_dic['episode_num'] = episode_num
print("step_num:        "+str(step_num))
meta_dic['step_num'] = step_num
print("***********************agent_1 information******************************")
tplt = "{0:{2}^20}\t{1:{2}^20}"
for i in agent_1_param_dic:
    parm = agent_1_param_dic[i]
    print(tplt.format(i, parm, chr(12288)))
meta_dic["agent_1"] = agent_1_param_dic

print("***********************agent_2 information******************************")
for i in agent_2_param_dic:
    parm = agent_2_param_dic[i]
    print(tplt.format(i, parm, chr(12288)))
meta_dic["agent_2"] = agent_2_param_dic

system.data_manager.save_meta_data(meta_dic)

print("***********************traning information******************************")


while episode_cnt < episode_num:
    # 1 reset the whole system
    system.reset()
    step_cnt = 0
    score_per_ep = 0

    # 2 get the initial state
    if if_robust:
        tmp = system.observe()
        #z = np.random.multivariate_normal(np.zeros(2), 0.5*np.eye(2), size=len(tmp)).view(np.complex128)
        z = np.random.normal(size=len(tmp))
        observersion_1 = list(
            np.array(tmp) + 0.6 *1e-7* z
            )
    else:
        observersion_1 = system.observe()
    observersion_2 = list(system.UAV.coordinate)
    if episode_cnt == 80:
        print("break point")
    while step_cnt < step_num:
        # 1 count num of step in one episode
        step_cnt += 1
        # judge if pause the whole system
        if not system.render_obj.pause:
            # 2 choose action acoording to current state
            action_1 = agent_1.choose_action(observersion_1, greedy=agent_1_param_dic["action_noise_factor"] * math.pow((1-episode_cnt / episode_num), 2))
            action_2 = agent_2.choose_action(observersion_2, greedy=agent_2_param_dic["action_noise_factor"]* math.pow((1-episode_cnt / episode_num), 2))
            if if_BS:
                action_2[0]=0
                action_2[1]=0

            if if_Theta_fixed:
                action_1[0+2 * system.UAV.ant_num * system.user_num:] = len(action_1[0+2 * system.UAV.ant_num * system.user_num:])*[0]
                
            if if_G_fixed:
                action_1[0:0+2 * system.UAV.ant_num * system.user_num]=np.array([-0.0313, -0.9838, 0.3210, 1.0, -0.9786, -0.1448, 0.3518, 0.5813, -1.0, -0.2803, -0.4616, -0.6352, -0.1449, 0.7040, 0.4090, -0.8521]) * math.pow(episode_cnt / episode_num, 2) * 0.7
                #action_1[0:0+2 * system.UAV.ant_num * system.user_num]=len(action_1[0:0+2 * system.UAV.ant_num * system.user_num])*[0.5]
            # 3 get newstate, reward
            if system.if_with_RIS:
                new_state_1, reward, done, info = system.step(
                    action_0=action_2[0],
                    action_1=action_2[1],
                    G=action_1[0:0+2 * system.UAV.ant_num * system.user_num],
                    Phi=action_1[0+2 * system.UAV.ant_num * system.user_num:],
                    set_pos_x=action_2[0],
                    set_pos_y=action_2[1]
                )
                new_state_2 = list(system.UAV.coordinate)
            else:
                new_state_1, reward, done, info = system.step(
                    action_0=action_2[0],
                    action_1=action_2[1],
                    G=action_1[0:0+2 * system.UAV.ant_num * system.user_num],
                    set_pos_x=action_2[0],
                    set_pos_y=action_2[1]
                )
                new_state_2 = list(system.UAV.coordinate)

            score_per_ep += reward
            # 4 store state pair into mem pool
            agent_1.remember(observersion_1, action_1, reward, new_state_1, int(done))
            agent_2.remember(observersion_2, action_2, reward, new_state_2, int(done))
            # 5 update DDPG net
            agent_1.learn()
            if not TRAINED_UAV:
                agent_2.learn()

            #system.render_obj.render(0.001) # no rendering for faster
            observersion_1 = new_state_1
            observersion_2 = new_state_2
            if done == True:
                break
            
        else:
            #system.render_obj.render_pause()  # no rendering for faster
            time.sleep(0.001) #time.sleep(1)
    system.data_manager.save_file(episode_cnt=episode_cnt)
    system.reset()
    print("ep_num: "+str(episode_cnt)+"   ep_score:  "+str(score_per_ep))
    episode_cnt +=1
    if episode_cnt % 10 == 0:
        agent_1.save_models()
        agent_2.save_models()

# save the last model
agent_1.save_models()
agent_2.save_models()


================================================
FILE: math_tool.py
================================================
import math
import cmath
import numpy as np
import pandas as pd
from numpy import *
from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib.patches import FancyArrowPatch
from mpl_toolkits.mplot3d import proj3d

def cartesian_coordinate_to_spherical_coordinate(cartesian_coordinate):
    """
    transmit cartesian_coordinate_to_spherical_coordinate
    input 1 X 3 np.array,   [x, y, z]
    output 1 X 3 np.array,  [r, theta, fai]
    """
    r = np.linalg.norm(cartesian_coordinate)
    if abs(cartesian_coordinate[2]) < 1e-8:
        theta = math.atan(np.linalg.norm(cartesian_coordinate[0:2])/1e-8)
    else:
        theta = math.atan(np.linalg.norm(cartesian_coordinate[0:2])/cartesian_coordinate[2])
    
    if abs(cartesian_coordinate[0]) < 1e-8:
        x = 1e-8
    else:
        x = cartesian_coordinate[0]

    y = cartesian_coordinate[1]    
    if abs(y) < 1e-8:
        y = 1e-8

    if y > 0 and x > 0:
        fai = math.atan(y/x)
    elif x < 0 and y > 0:
        fai = math.atan(y/x) + math.pi
    elif x < 0 and y < 0:
        fai = math.atan(y/x) - math.pi
    else:
        fai = math.atan(y/x)
    return r, theta, fai

def vecter_normalization(cartesian_coordinate):
    return cartesian_coordinate/np.linalg.norm(cartesian_coordinate)

def get_coor_ref(coor_sys, coor):
    """
    input:  coor_sys: normalized 1,3 np.array list (1,3)
            coor: coordinate under earth system
    output: referenced coordinate for x,y, normalized 1,3 np.array
    """
    x_ref = np.dot(coor_sys[0],coor)
    y_ref = np.dot(coor_sys[1],coor)
    z_ref = np.dot(coor_sys[2],coor)
    return np.array([x_ref, y_ref, z_ref])

def dB_to_normal(dB):
    """
    input: dB
    output: normal vaule
    """
    return math.pow(10, (dB/10))

def normal_to_dB(normal):
    """
    input: normal
    output: dB value
    """
    return -10 * math.log10(normal)

def diag_to_vector(diag):
    """
    transfer a diagnal matrix into a vector
    """
    vec_size = np.shape(diag)[0]
    vector = np.mat(np.zeros((vec_size, 1), dtype=complex), dtype=complex)
    for i in range(vec_size):
        vector[i, 0] = diag[i, i]
    return vector
    
def vector_to_diag(vector):
    """
    transfer a vector into a diagnal matrix
    """
    vec_size = np.shape(vector)[0]
    diag = np.mat(np.zeros((vec_size, vec_size), dtype=complex), dtype=complex)
    for i in range(vec_size):
        diag[i, i] = vector[i, 0]
    return diag

def bigger_than_zero(value):
    """
    max(0,value)
    """
    return max(0, value)

def dataframe_to_dictionary(df):
    """
    docstring
    """
    return {col_name : df[col_name].values for col_name in df.columns.values}

def convert_list_to_complex_matrix(list_real, shape):
    """
    list_real is a 2* N*K dim list, convert it to N X K complex matrix
    shape is a tuple (N, K)
    """
    N = shape[0]
    K = shape[1]
    matrix_complex =np.mat(np.zeros((N, K), dtype=complex), dtype=complex) 
    for i in range(N):
        for j in range(K):
            matrix_complex[i, j] = list_real[2*(i*K + j)] + 1j * list_real[2*(i*K + j) + 1]
            
    return matrix_complex

def convert_list_to_complex_diag(list_real, diag_row_num):
    """
    list_real is a M dim list, convert it to M X M complex diag matrix
    diag_row_num is the M
    """
    M = diag_row_num
    diag_matrix_complex = np.mat(np.zeros((M, M), dtype=complex), dtype=complex) 
    for i in range(M):
        diag_matrix_complex[i, i] = cmath.exp(1j * list_real[i] * math.pi)
    return diag_matrix_complex

def map_to(x, x_range:tuple, y_range:tuple):
    x_min = x_range[0]
    x_max = x_range[1]
    y_min = y_range[0]
    y_max = y_range[1]
    y = y_min+(y_max - y_min) / (x_max - x_min) * (x - x_min)
    return y



================================================
FILE: plot_see.py
================================================
import matplotlib as mpl
mpl.rcParams['figure.dpi'] = 300
import matplotlib.pyplot as plt
import numpy as np
import cmath
from scipy.io import loadmat, savemat
import pandas as pd
import os
import copy
import math


######################################################
# new for energy 
# energy related parameters of rotary-wing UAV
# based on Energy Minimization in Internet-of-Things System Based on Rotary-Wing UAV
P_i = 790.6715
P_0 = 580.65
U2_tip = (200) ** 2
s = 0.05
d_0 = 0.3
p = 1.225
A = 0.79
delta_time = 0.1 #0.1/1000 #0.1ms

# add ons hover veloctiy
# based on https://www.intechopen.com/chapters/57483
m = 1.3 # mass: assume 1.3kg https://www.droneblog.com/average-weights-of-common-types-of-drones/#:~:text=In%20most%20cases%2C%20toy%20drones,What%20is%20this%3F
g = 9.81 # gravity
T = m * g # thrust
v_0 = (T / (A * 2 * p)) ** 0.5

def get_energy_consumption(v_t):
    '''
    arg
    1) v_t = displacement per time slot
    '''
    energy_1 = P_0 \
                + 3 * P_0 * (abs(v_t)) ** 2 / U2_tip \
                + 0.5 * d_0 * p * s * A * (abs(v_t))**3
    
    energy_2 = P_i * ((
                    (1 + (abs(v_t) ** 4) / (4 * (v_0 ** 4))) ** 0.5 \
                    - (abs(v_t) ** 2) / (2 * (v_0 **2)) \
                ) ** 0.5)
    
    energy = delta_time * (energy_1 + energy_2)
    return energy 

ENERGY_MIN = get_energy_consumption(0.25)
ENERGY_MAX = get_energy_consumption(0)

######################################################


# modified from data_manager.py
init_data_file = 'data/init_location.xlsx'
def read_init_location(entity_type = 'user', index = 0):
    if entity_type == 'user' or 'attacker' or 'RIS' or 'RIS_norm_vec' or 'UAV':
        return np.array([\
        pd.read_excel(init_data_file, sheet_name=entity_type)['x'][index],\
        pd.read_excel(init_data_file, sheet_name=entity_type)['y'][index],\
        pd.read_excel(init_data_file, sheet_name=entity_type)['z'][index]])
    else:
        return None


# load and plot everything
class LoadAndPlot(object):
    """
    load date and plot 2022-07-22 16_16_26
    """
    def __init__(self, store_paths, \
                       user_num = 2, attacker_num = 1, RIS_ant_num = 4, \
                       ep_num = 300, step_num = 100): # RIS_ant_num = 16 (not true)

        self.store_paths = store_paths
        
        self.color_list = ['b', 'c', 'g', 'k', 'm', 'r', 'y']
#        self.store_paths = store_paths + '//'
        self.user_num = user_num
        self.attacker_num = attacker_num
        self.RIS_ant_num = RIS_ant_num
        self.ep_num = ep_num
        self.step_num = step_num


    def load_one_ep(self, file_name):
        m = loadmat(self.store_path + file_name)
        return m


    def load_all_steps(self):
        result_dic = {}
        result_dic.update({'reward':[]})

        result_dic.update({'user_capacity':[]})
        for i in range(self.user_num):
            result_dic['user_capacity'].append([])

        result_dic.update({'secure_capacity':[]})
        for i in range(self.user_num):
            result_dic['secure_capacity'].append([])

        result_dic.update({'attaker_capacity':[]})
        for i in range(self.attacker_num):
            result_dic['attaker_capacity'].append([])
        
        result_dic.update({'RIS_elements':[]})
        for i in range(self.RIS_ant_num):
            result_dic['RIS_elements'].append([])

        for ep_cnt in range(self.ep_num):
            mat_ep = self.load_one_ep("simulation_result_ep_" + str(ep_cnt) + ".mat")

            one_ep_reward = mat_ep["result_" + str(ep_cnt)]["reward"][0][0]
            result_dic['reward'] += list(one_ep_reward[:, 0])

            one_ep_user_capacity = mat_ep["result_" + str(ep_cnt)]["user_capacity"][0][0]
            for i in range(self.user_num):
                result_dic['user_capacity'][i] += list(one_ep_user_capacity[:, i])
            
            one_ep_secure_capacity = mat_ep["result_" + str(ep_cnt)]["secure_capacity"][0][0]
            for i in range(self.user_num):
                result_dic['secure_capacity'][i] += list(one_ep_secure_capacity[:, i])
            
            one_ep_attaker_capacity = mat_ep["result_" + str(ep_cnt)]["attaker_capacity"][0][0]
            for i in range(self.attacker_num):
                result_dic['attaker_capacity'][i] += list(one_ep_attaker_capacity[:, i])

            one_ep_RIS_first_element = mat_ep["result_" + str(ep_cnt)]["reflecting_coefficient"][0][0]
            for i in range(self.RIS_ant_num):
                result_dic['RIS_elements'][i] += list(one_ep_RIS_first_element[:, i])

        return result_dic


    def plot(self):
        """
        plot result
        b--blue c--cyan(青色） g--green k--black m--magenta（紫红色） r--red w--white y--yellow 
        """

        
      
        ###############################
        # plot average secrecy energy efficient of each episode
        ###############################
        fig = plt.figure('average_secrecy_energy_efficiency')

        # get init location
        init_uav_coord = read_init_location(entity_type = 'UAV')
        init_user_coord_0 = read_init_location(entity_type = 'user', index=0)
        init_user_coord_1 = read_init_location(entity_type = 'user', index=1)       
        
        # paths
        legends = ['TDDRL', 'TTD3', 'TDDRL (Energy Penalty)', 'TTD3 (Energy Penalty)']
        all_average_see = []
        all_energies = []
        
        # energies
        for store_path in self.store_paths:
            energies = []
            for i in range(self.ep_num):
                # read the mat file
                filename = f'simulation_result_ep_{i}.mat'
                filename = os.path.join(store_path, filename)
                data = loadmat(filename)
            
                # v_ts
                energies_one_episode = []
            
                # loop all uav movt
                uav_movt = data[f'result_{i}'][0][0][-1]
                for j in range(uav_movt.shape[0]):
                    move_x = uav_movt[j][0]
                    move_y = uav_movt[j][1]
                    v_t = (move_x ** 2 + move_y ** 2) ** 0.5
                    energy = get_energy_consumption(v_t / delta_time)
                    energies_one_episode.append(energy)
                energies.append(energies_one_episode)
            all_energies.append(energies)
        
        # see
        for store_path, legend in zip(self.store_paths, legends):
            average_see = []
            # ssr
            self.store_path = store_path + '//'
            self.all_steps = self.load_all_steps()
            
            sum_secrecy_rate = np.array(self.all_steps['secure_capacity'])
            sum_secrecy_rate = np.sum(sum_secrecy_rate, axis = 0)

            # energy
            energies = all_energies.pop(0)
            for i in range(0, self.ep_num * self.step_num, self.step_num):
                ssr_one_episode = sum_secrecy_rate[i:i+self.step_num] # ssr means Sum Secrecy Rate
                energies_one_episode = energies.pop(0)
                ssr_one_episode = ssr_one_episode[:len(energies_one_episode)]
                energies_one_episode = energies_one_episode[:len(ssr_one_episode)]
                try:
                    see = np.array(ssr_one_episode) / np.array(energies_one_episode)
                    average_see.append(sum(see)/len(see))
                except:
                    average_see.append(0)
            
            # change from /J to /kJ
            average_see = np.array(average_see) * 1000
            average_see = list(average_see)
            
            all_average_see.append(average_see)  
            plt.plot(range(len(average_see)), average_see, label=legend)
            plt.xlabel("Episodes (Ep)")
            plt.ylabel("Average Secrecy Energy Efficiency")
                
        plt.legend()
        plt.savefig('data/average_secrecy_energy_efficiency.png')

        
        # dictionary of lists  
        dict = {legend: average_see for legend, average_see in zip(legends, all_average_see)} 
        df = pd.DataFrame(dict)
        df.to_excel('data/average_secrecy_energy_efficiency.xlsx', index=False) 

        
if __name__ == '__main__':
    LoadPlotObject = LoadAndPlot(
            store_paths = ['data/storage/scratch/ddpg_ssr', 'data/storage/scratch/td3_ssr', 'data/storage/scratch/ddpg_see', 'data/storage/scratch/td3_see'],
            ep_num = 300,
         )
    LoadPlotObject.plot()

    



================================================
FILE: plot_ssr.py
================================================
import matplotlib as mpl
mpl.rcParams['figure.dpi'] = 300
import matplotlib.pyplot as plt
import numpy as np
import cmath
from scipy.io import loadmat, savemat
import pandas as pd
import os
import copy
import math


######################################################
# new for energy 
# energy related parameters of rotary-wing UAV
# based on Energy Minimization in Internet-of-Things System Based on Rotary-Wing UAV
P_i = 790.6715
P_0 = 580.65
U2_tip = (200) ** 2
s = 0.05
d_0 = 0.3
p = 1.225
A = 0.79
delta_time = 0.1 #0.1/1000 #0.1ms

# add ons hover veloctiy
# based on https://www.intechopen.com/chapters/57483
m = 1.3 # mass: assume 1.3kg https://www.droneblog.com/average-weights-of-common-types-of-drones/#:~:text=In%20most%20cases%2C%20toy%20drones,What%20is%20this%3F
g = 9.81 # gravity
T = m * g # thrust
v_0 = (T / (A * 2 * p)) ** 0.5

def get_energy_consumption(v_t):
    '''
    arg
    1) v_t = displacement per time slot
    '''
    energy_1 = P_0 \
                + 3 * P_0 * (abs(v_t)) ** 2 / U2_tip \
                + 0.5 * d_0 * p * s * A * (abs(v_t))**3
    
    energy_2 = P_i * ((
                    (1 + (abs(v_t) ** 4) / (4 * (v_0 ** 4))) ** 0.5 \
                    - (abs(v_t) ** 2) / (2 * (v_0 **2)) \
                ) ** 0.5)
    
    energy = delta_time * (energy_1 + energy_2)
    return energy 

ENERGY_MIN = get_energy_consumption(0.25)
ENERGY_MAX = get_energy_consumption(0)

######################################################


# modified from data_manager.py
init_data_file = 'data/init_location.xlsx'
def read_init_location(entity_type = 'user', index = 0):
    if entity_type == 'user' or 'attacker' or 'RIS' or 'RIS_norm_vec' or 'UAV':
        return np.array([\
        pd.read_excel(init_data_file, sheet_name=entity_type)['x'][index],\
        pd.read_excel(init_data_file, sheet_name=entity_type)['y'][index],\
        pd.read_excel(init_data_file, sheet_name=entity_type)['z'][index]])
    else:
        return None


# load and plot everything
class LoadAndPlot(object):
    """
    load date and plot 2022-07-22 16_16_26
    """
    def __init__(self, store_paths, \
                       user_num = 2, attacker_num = 1, RIS_ant_num = 4, \
                       ep_num = 300, step_num = 100): # RIS_ant_num = 16 (not true)
        
        self.store_paths = store_paths

        self.color_list = ['b', 'c', 'g', 'k', 'm', 'r', 'y']
#        self.store_path = store_path + '//'
        self.user_num = user_num
        self.attacker_num = attacker_num
        self.RIS_ant_num = RIS_ant_num
        self.ep_num = ep_num
        self.step_num = step_num


    def load_one_ep(self, file_name):
        m = loadmat(self.store_path + file_name)
        return m


    def load_all_steps(self):
        result_dic = {}
        result_dic.update({'reward':[]})

        result_dic.update({'user_capacity':[]})
        for i in range(self.user_num):
            result_dic['user_capacity'].append([])

        result_dic.update({'secure_capacity':[]})
        for i in range(self.user_num):
            result_dic['secure_capacity'].append([])

        result_dic.update({'attaker_capacity':[]})
        for i in range(self.attacker_num):
            result_dic['attaker_capacity'].append([])
        
        result_dic.update({'RIS_elements':[]})
        for i in range(self.RIS_ant_num):
            result_dic['RIS_elements'].append([])

        for ep_cnt in range(self.ep_num):
            mat_ep = self.load_one_ep("simulation_result_ep_" + str(ep_cnt) + ".mat")

            one_ep_reward = mat_ep["result_" + str(ep_cnt)]["reward"][0][0]
            result_dic['reward'] += list(one_ep_reward[:, 0])

            one_ep_user_capacity = mat_ep["result_" + str(ep_cnt)]["user_capacity"][0][0]
            for i in range(self.user_num):
                result_dic['user_capacity'][i] += list(one_ep_user_capacity[:, i])
            
            one_ep_secure_capacity = mat_ep["result_" + str(ep_cnt)]["secure_capacity"][0][0]
            for i in range(self.user_num):
                result_dic['secure_capacity'][i] += list(one_ep_secure_capacity[:, i])
            
            one_ep_attaker_capacity = mat_ep["result_" + str(ep_cnt)]["attaker_capacity"][0][0]
            for i in range(self.attacker_num):
                result_dic['attaker_capacity'][i] += list(one_ep_attaker_capacity[:, i])

            one_ep_RIS_first_element = mat_ep["result_" + str(ep_cnt)]["reflecting_coefficient"][0][0]
            for i in range(self.RIS_ant_num):
                result_dic['RIS_elements'][i] += list(one_ep_RIS_first_element[:, i])

        return result_dic


    def plot(self):
        """
        plot result
        b--blue c--cyan(青色） g--green k--black m--magenta（紫红色） r--red w--white y--yellow 
        """

        
      
        ###############################
        # plot average sum secrecy rate of each episode
        ###############################
        fig = plt.figure('average_sum_secrecy_rate')
        
#        store_paths = ['data/storage/ddpg 4', 'data/storage/td3 3', 'data/storage/ddpg seem 6', 'data/storage/td3 seem 2']
      #  store_paths = r'data/storage/scratch/td3_ssr'
        legends = ['TDDRL', 'TTD3', 'TDDRL (Energy Penalty)', 'TTD3 (Energy Penalty)']
        all_average_sum_secrecy_rate = []
        for store_path, legend in zip(self.store_paths, legends):
            self.store_path = store_path + '//'
            self.all_steps = self.load_all_steps()
            
            sum_secrecy_rate = np.array(self.all_steps['secure_capacity'])
            sum_secrecy_rate = np.sum(sum_secrecy_rate, axis = 0)
            average_sum_secrecy_rate = []
            ssr = []
            for i in range(0, self.ep_num * self.step_num, self.step_num):
                ssr_one_episode = sum_secrecy_rate[i:i+self.step_num] # ssr means Sum Secrecy Rate
                ssr.append(ssr_one_episode)
                try:
                    _ = sum(ssr_one_episode) / len(ssr_one_episode)
                except:
                    _ = 0
                average_sum_secrecy_rate.append(_)
                
            all_average_sum_secrecy_rate.append(average_sum_secrecy_rate)  
            plt.plot(range(len(average_sum_secrecy_rate)), average_sum_secrecy_rate, label=legend)
            plt.xlabel("Episodes (Ep)")
            plt.ylabel("Average Sum Secrecy Rate")
                
        plt.legend()
        plt.savefig('data/average_sum_secrecy_rate.png')
        
        
        # transpose
        '''
        all_average_sum_secrecy_rate = np.array(all_average_sum_secrecy_rate)
        all_average_sum_secrecy_rate = np.transpose(all_average_sum_secrecy_rate)
        all_average_sum_secrecy_rate = list(all_average_sum_secrecy_rate)
        '''
        
        # dictionary of lists  
        dict = {legend: average_sum_secrecy_rate for legend, average_sum_secrecy_rate in zip(legends, all_average_sum_secrecy_rate)} 
        df = pd.DataFrame(dict)
        df.to_excel('data/average_sum_secrecy_rate.xlsx', index=False) 

        
if __name__ == '__main__':
    LoadPlotObject = LoadAndPlot(
        store_paths = ['data/storage/scratch/ddpg_ssr', 'data/storage/scratch/td3_ssr', 'data/storage/scratch/ddpg_see', 'data/storage/scratch/td3_see'],
        ep_num = 300,
        )
    LoadPlotObject.plot()

    



================================================
FILE: plot_traj.py
================================================
import matplotlib as mpl
mpl.rcParams['figure.dpi'] = 100
import matplotlib.pyplot as plt
import numpy as np
import cmath
from scipy.io import loadmat, savemat
import pandas as pd
import os
import copy
import math


######################################################
# new for energy 
# energy related parameters of rotary-wing UAV
# based on Energy Minimization in Internet-of-Things System Based on Rotary-Wing UAV
P_i = 790.6715
P_0 = 580.65
U2_tip = (200) ** 2
s = 0.05
d_0 = 0.3
p = 1.225
A = 0.79
delta_time = 0.1 #0.1/1000 #0.1ms

# add ons hover veloctiy
# based on https://www.intechopen.com/chapters/57483
m = 1.3 # mass: assume 1.3kg https://www.droneblog.com/average-weights-of-common-types-of-drones/#:~:text=In%20most%20cases%2C%20toy%20drones,What%20is%20this%3F
g = 9.81 # gravity
T = m * g # thrust
v_0 = (T / (A * 2 * p)) ** 0.5

def get_energy_consumption(v_t):
    '''
    arg
    1) v_t = displacement per time slot
    '''
    energy_1 = P_0 \
                + 3 * P_0 * (abs(v_t)) ** 2 / U2_tip \
                + 0.5 * d_0 * p * s * A * (abs(v_t))**3
    
    energy_2 = P_i * ((
                    (1 + (abs(v_t) ** 4) / (4 * (v_0 ** 4))) ** 0.5 \
                    - (abs(v_t) ** 2) / (2 * (v_0 **2)) \
                ) ** 0.5)
    
    energy = delta_time * (energy_1 + energy_2)
    return energy 

ENERGY_MIN = get_energy_consumption(0.25)
ENERGY_MAX = get_energy_consumption(0)

######################################################


# modified from data_manager.py
init_data_file = 'data/init_location.xlsx'
def read_init_location(entity_type = 'user', index = 0):
    if entity_type == 'user' or 'attacker' or 'RIS' or 'RIS_norm_vec' or 'UAV':
        return np.array([\
        pd.read_excel(init_data_file, sheet_name=entity_type)['x'][index],\
        pd.read_excel(init_data_file, sheet_name=entity_type)['y'][index],\
        pd.read_excel(init_data_file, sheet_name=entity_type)['z'][index]])
    else:
        return None


# load and plot everything
class LoadAndPlot(object):
    """
    load date and plot 2022-07-22 16_16_26
    """
    def __init__(self, store_paths, \
                       user_num = 2, attacker_num = 1, RIS_ant_num = 4, \
                       ep_num = 300, step_num = 100): # RIS_ant_num = 16 (not true)

        self.store_paths = store_paths
        self.color_list = ['b', 'c', 'g', 'k', 'm', 'r', 'y']
#        self.store_path = store_path + '//'
        self.user_num = user_num
        self.attacker_num = attacker_num
        self.RIS_ant_num = RIS_ant_num
        self.ep_num = ep_num
        self.step_num = step_num


    def load_one_ep(self, file_name):
        m = loadmat(self.store_path + file_name)
        return m


    def load_all_steps(self):
        result_dic = {}
        result_dic.update({'reward':[]})

        result_dic.update({'user_capacity':[]})
        for i in range(self.user_num):
            result_dic['user_capacity'].append([])

        result_dic.update({'secure_capacity':[]})
        for i in range(self.user_num):
            result_dic['secure_capacity'].append([])

        result_dic.update({'attaker_capacity':[]})
        for i in range(self.attacker_num):
            result_dic['attaker_capacity'].append([])
        
        result_dic.update({'RIS_elements':[]})
        for i in range(self.RIS_ant_num):
            result_dic['RIS_elements'].append([])

        for ep_cnt in range(self.ep_num):
            mat_ep = self.load_one_ep("simulation_result_ep_" + str(ep_cnt) + ".mat")

            one_ep_reward = mat_ep["result_" + str(ep_cnt)]["reward"][0][0]
            result_dic['reward'] += list(one_ep_reward[:, 0])

            one_ep_user_capacity = mat_ep["result_" + str(ep_cnt)]["user_capacity"][0][0]
            for i in range(self.user_num):
                result_dic['user_capacity'][i] += list(one_ep_user_capacity[:, i])
            
            one_ep_secure_capacity = mat_ep["result_" + str(ep_cnt)]["secure_capacity"][0][0]
            for i in range(self.user_num):
                result_dic['secure_capacity'][i] += list(one_ep_secure_capacity[:, i])
            
            one_ep_attaker_capacity = mat_ep["result_" + str(ep_cnt)]["attaker_capacity"][0][0]
            for i in range(self.attacker_num):
                result_dic['attaker_capacity'][i] += list(one_ep_attaker_capacity[:, i])

            one_ep_RIS_first_element = mat_ep["result_" + str(ep_cnt)]["reflecting_coefficient"][0][0]
            for i in range(self.RIS_ant_num):
                result_dic['RIS_elements'][i] += list(one_ep_RIS_first_element[:, i])

        return result_dic


    def plot(self):
        """
        plot result
        b--blue c--cyan(青色） g--green k--black m--magenta（紫红色） r--red w--white y--yellow 
        """

        
      
        ###############################
        # plot trajectory
        ###############################
        # create a fig
        fig, ax = plt.subplots(figsize=(5.4,5.2))
        #fig = plt.figure('trajectory')
        MARKER_SIZE = 8

        # colour
        color_list_template = ['b', 'g', 'c', 'k', 'm', 'r', 'y', 'black', 'red']
        color_list_template = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']
        color_list = copy.deepcopy(color_list_template)

        # get init location
        init_uav_coord = read_init_location(entity_type = 'UAV')
        init_ris_coord = read_init_location(entity_type = 'RIS')
        init_eaves_coord = read_init_location(entity_type = 'attacker')
        init_user_coord_0 = read_init_location(entity_type = 'user', index=0)
        init_user_coord_1 = read_init_location(entity_type = 'user', index=1)
    
        
        plt.text(20, init_uav_coord[0]-1, 'UAV Initial Coordinate', fontsize = 11)
        plt.plot([init_uav_coord[1]], [init_uav_coord[0]], marker="s", markersize=MARKER_SIZE, markeredgecolor="black", markerfacecolor="none")
        
        plt.text(46, init_ris_coord[0]-1, 'RIS', fontsize = 11)
        plt.plot([init_ris_coord[1]], [init_ris_coord[0]], marker="d", markersize=MARKER_SIZE, markeredgecolor="black", markerfacecolor="none")
        
        plt.text(36, init_eaves_coord[0]-1, 'Eavesdropper', fontsize = 11)
        plt.plot([init_eaves_coord[1]], [init_eaves_coord[0]], marker="v", markersize=MARKER_SIZE, markeredgecolor="black", markerfacecolor="none")
        
        # paths
#        store_paths = ['data/storage/ddpg 2', 'data/storage/td3 3', 'data/storage/ddpg seem 3', 'data/storage/td3 seem 5']
#        store_paths = ['data/storage/ddpg 2', 'data/storage/td3 3', 'data/storage/ddpg seem 3', 'data/storage/td3 seem 5']
        legends = ['TDDRL', 'TTD3', 'TDDRL (Energy Penalty)', 'TTD3 (Energy Penalty)']
        legends = ['Benchmark 1', 'Benchmark 2', 'Benchmark 3', 'Proposed method']
     
        for store_path, legend in zip(self.store_paths, legends):
            # read the mat file
            i = 5 - 1 # episode 300
            filename = f'simulation_result_ep_{i}.mat'
            filename = os.path.join(store_path, filename)
            data = loadmat(filename)
        
            # uav movt
            uav_coord = [ [init_uav_coord[0]], [init_uav_coord[1]] ]
        
            uav_movt = data[f'result_{i}'][0][0][-1]
            for j in range(uav_movt.shape[0]):
                move_x = uav_movt[j][0]
                move_y = uav_movt[j][1]
        
                prev_x = uav_coord[0][-1]
                prev_y = uav_coord[1][-1]
        
                current_x = prev_x + move_x
                current_y = prev_y + move_y
        
                uav_coord[0].append(current_x)
                uav_coord[1].append(current_y)
            plt.plot(uav_coord[1],uav_coord[0], c=color_list.pop(0), label=legend)
        
        # user 0 movt
        direction_fai = -1/2*math.pi 
        distance_delta_d = 0.25
        user_coord_0 = [ [init_user_coord_0[0]], [init_user_coord_0[1]] ]
        plt.text(29, init_user_coord_0[0]-1, 'User 1 Initial Coordinate', fontsize = 11)
        #color_list = copy.deepcopy(color_list_template)
        for j in range(uav_movt.shape[0]):
            delta_x = distance_delta_d * math.cos(direction_fai)
            delta_y = distance_delta_d * math.sin(direction_fai)
        
            prev_x = user_coord_0[0][-1]
            prev_y = user_coord_0[1][-1]
        
            current_x = prev_x + delta_x
            current_y = prev_y + delta_y
        
            user_coord_0[0].append(current_x)
            user_coord_0[1].append(current_y)
        plt.plot(user_coord_0[1],user_coord_0[0], c=color_list.pop(0), linestyle='dashed', linewidth=2, label='User 1')
        plt.plot(user_coord_0[1][0], user_coord_0[0][0], marker="o", markersize=MARKER_SIZE, markeredgecolor="black", markerfacecolor="none")
        
        
        # user 1 movt
        direction_fai = -1/2*math.pi 
        distance_delta_d = 0.25
        user_coord_1 = [ [init_user_coord_1[0]], [init_user_coord_1[1]] ]
        plt.text(13, init_user_coord_1[0]-1, 'User 2 Initial Coordinate', fontsize = 11)
        #color_list = copy.deepcopy(color_list_template)
        for j in range(uav_movt.shape[0]):
            delta_x = distance_delta_d * math.cos(direction_fai)
            delta_y = distance_delta_d * math.sin(direction_fai)
        
            prev_x = user_coord_1[0][-1]
            prev_y = user_coord_1[1][-1]
        
            current_x = prev_x + delta_x
            current_y = prev_y + delta_y
        
            user_coord_1[0].append(current_x)
            user_coord_1[1].append(current_y)
        plt.plot(user_coord_1[1],user_coord_1[0], c=color_list.pop(0), linestyle='dashed', linewidth=2, label='User 2')
        plt.plot(user_coord_1[1][0], user_coord_1[0][0], marker="o", markersize=MARKER_SIZE, markeredgecolor="black", markerfacecolor="none")
        
        # plot a line between last coord of user 0 and user 1
        plt.plot([user_coord_0[1][-1], user_coord_1[1][0-1]], [user_coord_0[0][-1], user_coord_1[0][-1]], 'gray', linestyle='dashed')
        
        # plot midpoint between last coord of user 0 and user 1
        plt.plot([(user_coord_0[1][-1] + user_coord_1[1][0-1])/2], [(user_coord_0[0][-1] + user_coord_1[0][0-1])/2], marker="o", markersize=MARKER_SIZE, markeredgecolor="black", markerfacecolor="none")
        plt.text(12, 18, "Midpoint of \ntwo user's last location", fontsize = 11)
       
        plt.legend(loc='center right', fontsize=10)
        plt.grid()
        plt.xlim(0, 50)
        plt.ylim(-10, 30)
        plt.xlabel('x(m)')
        plt.ylabel('y(m)')
        plt.tight_layout()
        plt.gca().invert_yaxis()
        plt.savefig('data/trajectory.png')
        #plt.cla()

        
if __name__ == '__main__':
    LoadPlotObject = LoadAndPlot(
        store_paths = ['data/storage/scratch/ddpg_ssr', 'data/storage/scratch/td3_ssr', 'data/storage/scratch/ddpg_see', 'data/storage/scratch/td3_see'],
        ep_num=300,
        )
    LoadPlotObject.plot()

    



================================================
FILE: render.py
================================================
from mpl_toolkits import mplot3d
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib.patches import FancyArrowPatch
from mpl_toolkits.mplot3d import proj3d

class Arrow3D(FancyArrowPatch):
    def __init__(self, xs, ys, zs, *args, **kwargs):
        FancyArrowPatch.__init__(self, (0,0), (0,0), *args, **kwargs)
        self._verts3d = xs, ys, zs

    def draw(self, renderer):
        xs3d, ys3d, zs3d = self._verts3d
        xs, ys, zs = proj3d.proj_transform(xs3d, ys3d, zs3d, renderer.M)
        self.set_positions((xs[0],ys[0]),(xs[1],ys[1]))
        FancyArrowPatch.draw(self, renderer)

class Render(object):
    """
    plot functions to render the whole system
    """
    def __init__(self, system, canv_x = (-25, 25), canv_y = (0, 50), canv_z = (0, 60)):
        """
        docstring
        """
        plt.ion()
        self.system = system
        self.fig = plt.figure(1)
        self.pause = False
        self.t_index = 0

    def render_pause(self):
        """
        show whole system by using plt.show
        """
        plt.ion()
        ax = self.plot_config()
        # plot the position of UAV, RIS, Users & Attakers
        self.plot_entities(ax)
        self.plot_channels(ax)
        self.plot_text(ax)
        plt.show(self.fig)
        plt.cla() 
        self.pause = False
        plt.ioff() 

    def render(self, interval):
        """
        show whole system in 3D figure
        """
        plt.ion()
        ax = self.plot_config()
        # plot the position of UAV, RIS, Users & Attakers
        self.plot_entities(ax)
        self.plot_channels(ax)
        self.plot_text(ax)
        plt.pause(interval)
        plt.cla() 
        plt.ioff()

    def plot_click(self, event):
        self.pause ^= True

    def plot_config(self):
        self.fig = plt.figure(1)
        ax = plt.axes(projection='3d')
        ax.set_xlabel('X Label')
        ax.set_ylabel('Y Label')
        ax.set_zlabel('Z Label')
        ax.set_xlim3d(-25, 25)
        ax.set_ylim3d(0, 50)
        ax.set_zlim3d(0, 60)
        ax.view_init(90, 0)
        self.fig.canvas.mpl_connect('key_press_event', self.plot_click)
        return ax    
    
    def plot_entities(self, ax):
        """
        function used in render to show the UAV, RIS, users and attackers
        """
        ax.scatter(\
        self.system.UAV.coordinate[0],\
        self.system.UAV.coordinate[1],\
        self.system.UAV.coordinate[2],\
        color='r')
        ax.text(self.system.UAV.coordinate[0],self.system.UAV.coordinate[1],self.system.UAV.coordinate[2], \
        'UAV', size=15, zorder=1, color='r') 

        ax.scatter(\
        self.system.RIS.coordinate[0],\
        self.system.RIS.coordinate[1],\
        self.system.RIS.coordinate[2],\
        color='g')
        ax.text(self.system.RIS.coordinate[0],self.system.RIS.coordinate[1],self.system.RIS.coordinate[2], \
        'RIS', size=15, zorder=1, color='g') 

        for user in self.system.user_list:
            ax.scatter(
            user.coordinate[0],\
            user.coordinate[1],\
            user.coordinate[2],\
            color='b'
            )
            text = 'user_'+str(user.index) + '\n'\
            + 'noise power(dB)    = ' + str(user.noise_power) + '\n' \
            + 'capacity          = ' + str(user.capacity) + '\n'\
            + 'secure_capacity   = ' + str(user.secure_capacity)
            
            ax.text(user.coordinate[0],user.coordinate[1],user.coordinate[2], \
            text, size=10, zorder=1, color='b') 
        for attacker in self.system.attacker_list:
            ax.scatter(
            attacker.coordinate[0],\
            attacker.coordinate[1],\
            attacker.coordinate[2],\
            color='y'
            )
            ax.text(attacker.coordinate[0],attacker.coordinate[1],attacker.coordinate[2], \
            'attacker_'+str(attacker.index) + '\n'\
            +'capacities:' + str(attacker.capacity), size=10, zorder=1, color='y') 

    def plot_channels(self, ax):
        """
        function used in render to show the H_UR, h_U_k, h_R_k
        """
        for channel in self.system.h_R_k:
            self.plot_one_channel(ax, channel, "b")
        for channel in self.system.h_R_p:
            self.plot_one_channel(ax, channel, "y")
        for channel in self.system.h_U_k:
            self.plot_one_channel(ax, channel, "b")
        for channel in self.system.h_U_p:
            self.plot_one_channel(ax, channel, "y")
            
        self.plot_one_channel(ax, self.system.H_UR, "r")
        
    def plot_one_channel(self, ax, channel, color, text = "channel"):
        """
        function used in plot channels to show only one channel
        """        
        arrow_side_coor = channel.receiver.coordinate
        point_side_coor = channel.transmitter.coordinate

        text = channel.channel_name + '\n' \
        + 'n=' + str(channel.n) \
        + '     sigma=' + str(channel.sigma) +'\n'\
        + 'PL=' + str(channel.path_loss_normal) + '\n'\
        + 'PL(dB)=' + str(channel.path_loss_dB)
        
        x = (arrow_side_coor[0] + point_side_coor[0]) / 2
        y = (arrow_side_coor[1] + point_side_coor[1]) / 2
        z = (arrow_side_coor[2] + point_side_coor[2]) / 2
        ax.text(x, y, z, text, size=10, zorder=1, color=color) 
        
        channel_arrow = Arrow3D(\
        [point_side_coor[0], arrow_side_coor[0]], \
        [point_side_coor[1], arrow_side_coor[1]], \
        [point_side_coor[2], arrow_side_coor[2]],\
        mutation_scale=20, lw = 3, arrowstyle="-|>", color=color
        )
        ax.add_artist(channel_arrow)

    def plot_text(self, ax):
        """
        used in render to polt texts
        """
        text = "pause = " + str(self.pause) + "\n"\
        + "t_index = "    + str(self.t_index)
        ax.text(0, 0, 60, text, size=10, zorder=1, color='b') 

================================================
FILE: requirements.txt
================================================
# Twin-TD3 requirements
# Usage: pip install -r requirements.txt

matplotlib==3.4.3
pyparsing==3.0.9
# sklearn==0.0.post1
joblib==1.2.0
pandas==1.4.3
cycler==0.11.0
seaborn==0.12.2
kiwisolver==1.4.3
torchvision==0.13.0
torch>=1.7.0
scikit-learn==1.2.2
python-dateutil==2.8.2
pytz==2022.1
scipy==1.8.1
openpyxl==3.0.9


================================================
FILE: run_simulation.py
================================================
# debug field
import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"

import argparse

# get argument from user
parser = argparse.ArgumentParser()
parser.add_argument('--path', type = str, required = True, help="pretrained model weight path")

args = parser.parse_args()
STORE_PATH = args.path

# validate the weight path
if not os.path.isdir(STORE_PATH):
    raise NotImplementedError("The provided weight path does not exist!")

# get DRL_ALGO
if 'td3' in STORE_PATH:
    DRL_ALGO = 'td3'
else:
    DRL_ALGO = 'ddpg'

# reward design
if 'see' in STORE_PATH:
    REWARD_DESIGN = 'see'
else:
    REWARD_DESIGN = 'ssr'

# seeds and episode number
SEEDS = None

# process the argument
assert DRL_ALGO in ['ddpg', 'td3'], "drl must be ['ddpg', 'td3']"
assert REWARD_DESIGN in ['ssr', 'see'], "reward must be ['ssr', 'see']"
if SEEDS is not None:
    assert len(SEEDS) in [1, 2] and isinstance(SEEDS[0], int) and isinstance(SEEDS[-1], int), "seeds must be a list of 1 or 2 integer"

if DRL_ALGO == 'td3':
    from td3 import Agent
elif DRL_ALGO == 'ddpg':
    from ddpg import Agent
import ddpg

from env import MiniSystem
import numpy as np
import math
import time
import torch
import shutil

# 1 init system model
episode_num = 1
episode_cnt = 0
step_num = 100

project_name = STORE_PATH

system = MiniSystem(
    user_num=2,
    RIS_ant_num=4,
    UAV_ant_num=4,
    if_dir_link=1,
    if_with_RIS=True,
    if_move_users=True,
    if_movements=True,
    reverse_x_y=(False, False),
    if_UAV_pos_state = True,
    reward_design = REWARD_DESIGN,
    project_name = project_name,
    step_num = step_num
    )

if_Theta_fixed = False
if_G_fixed = False
if_BS = False
if_robust = True


# 2 init RL Agent
agent_1_param_dic = {}
agent_1_param_dic["alpha"] = 0.0001
agent_1_param_dic["beta"] = 0.001
agent_1_param_dic["input_dims"] = system.get_system_state_dim()
agent_1_param_dic["tau"] = 0.001
agent_1_param_dic["batch_size"] = 64
agent_1_param_dic["n_actions"] = system.get_system_action_dim() - 2
agent_1_param_dic["action_noise_factor"] = 0.1
agent_1_param_dic["memory_max_size"] = int(5/5 * episode_num * step_num) #/2
agent_1_param_dic["agent_name"] = "G_and_Phi"
agent_1_param_dic["layer1_size"] = 800
agent_1_param_dic["layer2_size"] = 600
agent_1_param_dic["layer3_size"] = 512
agent_1_param_dic["layer4_size"] = 256

agent_2_param_dic = {}
agent_2_param_dic["alpha"] = 0.0001
agent_2_param_dic["beta"] = 0.001
agent_2_param_dic["input_dims"] = 3
agent_2_param_dic["tau"] = 0.001
agent_2_param_dic["batch_size"] = 64
agent_2_param_dic["n_actions"] = 2
agent_2_param_dic["action_noise_factor"] = 0.5
agent_2_param_dic["memory_max_size"] = int(5/5 * episode_num * step_num) #/2
agent_2_param_dic["agent_name"] = "UAV"
agent_2_param_dic["layer1_size"] = 400
agent_2_param_dic["layer2_size"] = 300
agent_2_param_dic["layer3_size"] = 256
agent_2_param_dic["layer4_size"] = 128

if SEEDS is not None:
    torch.manual_seed(SEEDS[0]) # 1
    torch.cuda.manual_seed_all(SEEDS[0]) # 1
agent_1 = Agent(
    alpha       = agent_1_param_dic["alpha"],
    beta        = agent_1_param_dic["beta"],
    input_dims  = [agent_1_param_dic["input_dims"]],
    tau         = agent_1_param_dic["tau"],
    env         = system,
    batch_size  = agent_1_param_dic["batch_size"],
    layer1_size=agent_1_param_dic["layer1_size"],
    layer2_size=agent_1_param_dic["layer2_size"], 
    layer3_size=agent_1_param_dic["layer3_size"],
    layer4_size=agent_1_param_dic["layer4_size"],
    n_actions   = agent_1_param_dic["n_actions"],
    max_size = agent_1_param_dic["memory_max_size"],
    agent_name= agent_1_param_dic["agent_name"]
    ) 

if SEEDS is not None:
    torch.manual_seed(SEEDS[-1]) # 2
    torch.cuda.manual_seed_all(SEEDS[-1]) # 2
agent_2 = Agent(
    alpha       = agent_2_param_dic["alpha"],
    beta        = agent_2_param_dic["beta"],
    input_dims  = [agent_2_param_dic["input_dims"]],
    tau         = agent_2_param_dic["tau"],
    env         = system,
    batch_size  = agent_2_param_dic["batch_size"],
    layer1_size=agent_2_param_dic["layer1_size"],
    layer2_size=agent_2_param_dic["layer2_size"], 
    layer3_size=agent_2_param_dic["layer3_size"],
    layer4_size=agent_2_param_dic["layer4_size"],
    n_actions   = agent_2_param_dic["n_actions"],
    max_size = agent_2_param_dic["memory_max_size"],
    agent_name= agent_2_param_dic["agent_name"]
    ) 


if DRL_ALGO == 'td3':
    agent_1.load_models(
         load_file_actor = STORE_PATH + '/Actor_G_and_Phi_TD3',
         load_file_critic_1 = STORE_PATH + '/Critic_1_G_and_Phi_TD3',
         load_file_critic_2 = STORE_PATH + '/Critic_2_G_and_Phi_TD3'
         )
    agent_2.load_models(
         load_file_actor = STORE_PATH + '/Actor_UAV_TD3',
         load_file_critic_1 = STORE_PATH + '/Critic_1_UAV_TD3',
         load_file_critic_2 = STORE_PATH + '/Critic_2_UAV_TD3'
         )
elif DRL_ALGO == 'ddpg':
    agent_1.load_models(
         load_file_actor = STORE_PATH + '/Actor_G_and_Phi_ddpg',
         load_file_critic = STORE_PATH + '/Critic_G_and_Phi_ddpg'
         )
    agent_2.load_models(
         load_file_actor = STORE_PATH + '/Actor_UAV_ddpg',
         load_file_critic = STORE_PATH + '/Critic_UAV_ddpg'
         )

meta_dic = {}
print("***********************system information******************************")
print("folder_name:     "+str(system.data_manager.store_path))
meta_dic['folder_name'] = system.data_manager.store_path
print("user_num:        "+str(system.user_num))
meta_dic['user_num'] = system.user_num
print("if_dir:          "+str(system.if_dir_link))
meta_dic['if_dir_link'] = system.if_dir_link
print("if_with_RIS:     "+str(system.if_with_RIS))
meta_dic['if_with_RIS'] = system.if_with_RIS
print("if_user_m:       "+str(system.if_move_users))
meta_dic['if_move_users'] = system.if_move_users
print("RIS_ant_num:     "+str(system.RIS.ant_num))
meta_dic['system_RIS_ant_num'] = system.RIS.ant_num
print("UAV_ant_num:     "+str(system.UAV.ant_num))
meta_dic['system_UAV_ant_num'] = system.UAV.ant_num
print("if_movements:    "+str(system.if_movements))
meta_dic['system_if_movements'] = system.if_movements
print("reverse_x_y:     "+str(system.reverse_x_y))
meta_dic['system_reverse_x_y'] = system.reverse_x_y
print("if_UAV_pos_state:"+str(system.if_UAV_pos_state))
meta_dic['if_UAV_pos_state'] = system.if_UAV_pos_state

print("ep_num:          "+str(episode_num))
meta_dic['episode_num'] = episode_num
print("step_num:        "+str(step_num))
meta_dic['step_num'] = step_num
print("***********************agent_1 information******************************")
tplt = "{0:{2}^20}\t{1:{2}^20}"
for i in agent_1_param_dic:
    parm = agent_1_param_dic[i]
    print(tplt.format(i, parm, chr(12288)))
meta_dic["agent_1"] = agent_1_param_dic

print("***********************agent_2 information******************************")
for i in agent_2_param_dic:
    parm = agent_2_param_dic[i]
    print(tplt.format(i, parm, chr(12288)))
meta_dic["agent_2"] = agent_2_param_dic

system.data_manager.save_meta_data(meta_dic)

print("***********************traning information******************************")

try:
    while episode_cnt < episode_num:
        # 1 reset the whole system
        system.reset()
        step_cnt = 0
        score_per_ep = 0

        # 2 get the initial state
        if if_robust:
            tmp = system.observe()
            #z = np.random.multivariate_normal(np.zeros(2), 0.5*np.eye(2), size=len(tmp)).view(np.complex128)
            z = np.random.normal(size=len(tmp))
            observersion_1 = list(
                np.array(tmp) + 0.6 *1e-7* z
                )
        else:
            observersion_1 = system.observe()
        observersion_2 = list(system.UAV.coordinate)

        while step_cnt < step_num:
            # 1 count num of step in one episode
            step_cnt += 1
            # judge if pause the whole system
            if not system.render_obj.pause:
                # 2 choose action acoording to current state
                action_1 = agent_1.choose_action(observersion_1, greedy=agent_1_param_dic["action_noise_factor"] * math.pow((1-episode_cnt / episode_num), 2))
                action_2 = agent_2.choose_action(observersion_2, greedy=agent_2_param_dic["action_noise_factor"]* math.pow((1-episode_cnt / episode_num), 2))
                if if_BS:
                    action_2[0]=0
                    action_2[1]=0

                if if_Theta_fixed:
                    action_1[0+2 * system.UAV.ant_num * system.user_num:] = len(action_1[0+2 * system.UAV.ant_num * system.user_num:])*[0]

                if if_G_fixed:
                    action_1[0:0+2 * system.UAV.ant_num * system.user_num]=np.array([-0.0313, -0.9838, 0.3210, 1.0, -0.9786, -0.1448, 0.3518, 0.5813, -1.0, -0.2803, -0.4616, -0.6352, -0.1449, 0.7040, 0.4090, -0.8521]) * math.pow(episode_cnt / episode_num, 2) * 0.7
                    #action_1[0:0+2 * system.UAV.ant_num * system.user_num]=len(action_1[0:0+2 * system.UAV.ant_num * system.user_num])*[0.5]
                # 3 get newstate, reward
                if system.if_with_RIS:
                    new_state_1, reward, done, info = system.step(
                        action_0=action_2[0],
                        action_1=action_2[1],
                        G=action_1[0:0+2 * system.UAV.ant_num * system.user_num],
                        Phi=action_1[0+2 * system.UAV.ant_num * system.user_num:],
                        set_pos_x=action_2[0],
                        set_pos_y=action_2[1]
                    )
                    new_state_2 = list(system.UAV.coordinate)
                else:
                    new_state_1, reward, done, info = system.step(
                        action_0=action_2[0],
                        action_1=action_2[1],
                        G=action_1[0:0+2 * system.UAV.ant_num * system.user_num],
                        set_pos_x=action_2[0],
                        set_pos_y=action_2[1]
                    )
                    new_state_2 = list(system.UAV.coordinate)

                score_per_ep += reward

                # render
                system.render_obj.render(0.001) # no rendering for faster
                observersion_1 = new_state_1
                observersion_2 = new_state_2
                if done == True:
                    break

            else:
                system.render_obj.render_pause()  # no rendering for faster
                time.sleep(0.001) #time.sleep(1)

        system.reset()
        print("ep_num: "+str(episode_cnt)+"   ep_score:  "+str(score_per_ep))
        episode_cnt +=1
except KeyboardInterrupt:
    raise KeyboardInterrupt
finally:
    shutil.rmtree('data/storage/data')


================================================
FILE: td3.py
================================================
import os
import torch as T
#import torch.cuda as T
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
from data_manager import DataManager
class OUActionNoise(object):
    def __init__(self, mu, sigma=0.15, theta=.2, dt=1e-2, x0=None):
        self.theta = theta
        self.mu = mu
        self.sigma = sigma
        self.dt = dt
        self.x0 = x0
        self.reset()

    def __call__(self):
        x = self.x_prev + self.theta * (self.mu - self.x_prev) * self.dt + \
            self.sigma * np.sqrt(self.dt) * np.random.normal(size=self.mu.shape)
        self.x_prev = x
        return x

    def reset(self):
        self.x_prev = self.x0 if self.x0 is not None else np.zeros_like(self.mu)

    def __repr__(self):
        return 'OrnsteinUhlenbeckActionNoise(mu={}, sigma={})'.format(
                                                            self.mu, self.sigma)

class AWGNActionNoise(object):
    def __init__(self, mu = 0, sigma=1):
        self.mu = mu
        self.sigma = sigma

    def __call__(self):
        #self.mu = mu
        #self.sigma = sigma
        x = np.random.normal(size=self.mu.shape) * self.sigma
        return x

class ReplayBuffer(object):
    def __init__(self, max_size, input_shape, n_actions):
        self.mem_size = max_size
        self.mem_cntr = 0
        self.state_memory = np.zeros((self.mem_size, *input_shape))
        self.new_state_memory = np.zeros((self.mem_size, *input_shape))
        self.action_memory = np.zeros((self.mem_size, n_actions))
        self.reward_memory = np.zeros(self.mem_size)
        self.terminal_memory = np.zeros(self.mem_size, dtype=np.float32)

    def store_transition(self, state, action, reward, state_, done):
        index = self.mem_cntr % self.mem_size
        self.state_memory[index] = state
        self.new_state_memory[index] = state_
        self.action_memory[index] = action
        self.reward_memory[index] = reward
        self.terminal_memory[index] = 1 - done
        self.mem_cntr += 1

    def sample_buffer(self, batch_size):
        max_mem = min(self.mem_cntr, self.mem_size)

        batch = np.random.choice(max_mem, batch_size)

        states = self.state_memory[batch]
        actions = self.action_memory[batch]
        rewards = self.reward_memory[batch]
        states_ = self.new_state_memory[batch]
        terminal = self.terminal_memory[batch]

        return states, actions, rewards, states_, terminal

class CriticNetwork(nn.Module):
    def __init__(self, beta, input_dims, fc1_dims, fc2_dims, fc3_dims, fc4_dims, n_actions, name,
                 chkpt_dir='C:\\demo\\IRS_TD3_minimal\\main_foder\\tmp\\TD3', load_file = ''):
        super(CriticNetwork, self).__init__()
        self.input_dims = input_dims
        self.fc1_dims = fc1_dims
        self.fc2_dims = fc2_dims
        self.fc3_dims = fc3_dims
        self.fc4_dims = fc4_dims
        self.n_actions = n_actions
        self.checkpoint_file = os.path.join(chkpt_dir,name+'_TD3')
        self.load_file = 'C:\\demo\\other_branch\\Learning-based_Secure_Transmission_for_RIS_Aided_mmWave-UAV_Communications_with_Imperfect_CSI\\data\\mannal_store\\models\\Critic_UAV_TD3'
        self.fc1 = nn.Linear(*self.input_dims, self.fc1_dims)
        f1 = 1./np.sqrt(self.fc1.weight.data.size()[0])
        T.nn.init.uniform_(self.fc1.weight.data, -f1, f1)
        T.nn.init.uniform_(self.fc1.bias.data, -f1, f1)
        #self.fc1.weight.data.uniform_(-f1, f1)
        #self.fc1.bias.data.uniform_(-f1, f1)
        self.bn1 = nn.LayerNorm(self.fc1_dims)

        self.fc2 = nn.Linear(self.fc1_dims, self.fc2_dims)
        f2 = 1./np.sqrt(self.fc2.weight.data.size()[0])
        #f2 = 0.002
        T.nn.init.uniform_(self.fc2.weight.data, -f2, f2)
        T.nn.init.uniform_(self.fc2.bias.data, -f2, f2)
        #self.fc2.weight.data.uniform_(-f2, f2)
        #self.fc2.bias.data.uniform_(-f2, f2)
        self.bn2 = nn.LayerNorm(self.fc2_dims)

        self.fc3 = nn.Linear(self.fc2_dims, self.fc3_dims)
        f3 = 1./np.sqrt(self.fc3.weight.data.size()[0])
        T.nn.init.uniform_(self.fc3.weight.data, -f3, f3)
        T.nn.init.uniform_(self.fc3.bias.data, -f3, f3)
        #self.fc2.weight.data.uniform_(-f2, f2)
        #self.fc2.bias.data.uniform_(-f2, f2)
        self.bn3 = nn.LayerNorm(self.fc3_dims)

        self.fc4 = nn.Linear(self.fc3_dims, self.fc4_dims)
        f4 = 1./np.sqrt(self.fc4.weight.data.size()[0])
        T.nn.init.uniform_(self.fc4.weight.data, -f4, f4)
        T.nn.init.uniform_(self.fc4.bias.data, -f4, f4)
        #self.fc2.weight.data.uniform_(-f2, f2)
        #self.fc2.bias.data.uniform_(-f2, f2)
        self.bn4 = nn.LayerNorm(self.fc4_dims)

        self.action_value = nn.Linear(self.n_actions, self.fc4_dims)
        f5 = 0.003
        self.q = nn.Linear(self.fc4_dims, 1)
        T.nn.init.uniform_(self.q.weight.data, -f5, f5)
        T.nn.init.uniform_(self.q.bias.data, -f5, f5)
        #self.q.weight.data.uniform_(-f3, f3)
        #self.q.bias.data.uniform_(-f3, f3)

        self.optimizer = optim.Adam(self.parameters(), lr=beta)
#        if torch.cuda.available():
#            import torch.cuda as T
#        else:
#            import torch as T
        self.device = T.device('cuda:0' if T.cuda.is_available() else 'cpu')

        self.to(self.device)

    def forward(self, state, action):
        state_value = self.fc1(state)
        state_value = self.bn1(state_value)
        state_value = F.relu(state_value)
        state_value = self.fc2(state_value)
        state_value = self.bn2(state_value)
        state_value = F.relu(state_value)
        state_value = self.fc3(state_value)
        state_value = self.bn3(state_value)
        state_value = F.relu(state_value)
        state_value = self.fc4(state_value)
        state_value = self.bn4(state_value)

        action_value = F.relu(self.action_value(action))
        state_action_value = F.relu(T.add(state_value, action_value))
        state_action_value = self.q(state_action_value)

        return state_action_value

    def save_checkpoint(self):
        print('... saving checkpoint ...')
        T.save(self.state_dict(), self.checkpoint_file)

    def load_checkpoint(self,load_file = ''):
        print('... loading checkpoint ...')
        if T.cuda.is_available():
            self.load_state_dict(T.load(load_file))
        else:
            self.load_state_dict(T.load(load_file, map_location=T.device('cpu')))

class ActorNetwork(nn.Module):
    def __init__(self, alpha, input_dims, fc1_dims, fc2_dims, fc3_dims, fc4_dims, n_actions, name,
                 chkpt_dir='C:\\demo\\IRS_TD3_minimal\\main_foder\\tmp\\TD3', load_file = ''):
        super(ActorNetwork, self).__init__()
        self.input_dims = input_dims
        self.fc1_dims = fc1_dims
        self.fc2_dims = fc2_dims
        self.fc3_dims = fc3_dims
        self.fc4_dims = fc4_dims        
        self.n_actions = n_actions
        self.checkpoint_file = os.path.join(chkpt_dir,name+'_TD3')
        self.load_file = 'C:\\demo\\other_branch\\Learning-based_Secure_Transmission_for_RIS_Aided_mmWave-UAV_Communications_with_Imperfect_CSI\\data\\mannal_store\\models\\Actor_UAV_TD3'
        self.fc1 = nn.Linear(*self.input_dims, self.fc1_dims)
        f1 = 1./np.sqrt(self.fc1.weight.data.size()[0])
#        T.nn.init.uniform_(self.fc1.weight.data, -f1, f1)
#        T.nn.init.uniform_(self.fc1.bias.data, -f1, f1)
        self.fc1.weight.data.uniform_(-f1, f1)
        self.fc1.bias.data.uniform_(-f1, f1)
        self.bn1 = nn.LayerNorm(self.fc1_dims)

        self.fc2 = nn.Linear(self.fc1_dims, self.fc2_dims)
        #f2 = 0.002
        f2 = 1./np.sqrt(self.fc2.weight.data.size()[0])
#        T.nn.init.uniform_(self.fc2.weight.data, -f2, f2)
#        T.nn.init.uniform_(self.fc2.bias.data, -f2, f2)
        self.fc2.weight.data.uniform_(-f2, f2)
        self.fc2.bias.data.uniform_(-f2, f2)
        self.bn2 = nn.LayerNorm(self.fc2_dims)

        self.fc3 = nn.Linear(self.fc2_dims, self.fc3_dims)
        f3 = 1./np.sqrt(self.fc3.weight.data.size()[0])
#        T.nn.init.uniform_(self.fc3.weight.data, -f3, f3)
#        T.nn.init.uniform_(self.fc3.bias.data, -f3, f3)
        self.fc2.weight.data.uniform_(-f2, f2)
        self.fc2.bias.data.uniform_(-f2, f2)
        self.bn3 = nn.LayerNorm(self.fc3_dims)

        self.fc4 = nn.Linear(self.fc3_dims, self.fc4_dims)
        f4 = 1./np.sqrt(self.fc4.weight.data.size()[0])
#        T.nn.init.uniform_(self.fc4.weight.data, -f4, f4)
#        T.nn.init.uniform_(self.fc4.bias.data, -f4, f4)
        self.fc2.weight.data.uniform_(-f2, f2)
        self.fc2.bias.data.uniform_(-f2, f2)
        self.bn4 = nn.LayerNorm(self.fc4_dims)

        #f3 = 0.004
        f5 = 0.003
        self.mu = nn.Linear(self.fc4_dims, self.n_actions)
#        T.nn.init.uniform_(self.mu.weight.data, -f5, f5)
#        T.nn.init.uniform_(self.mu.bias.data, -f5, f5)
        self.mu.weight.data.uniform_(-f3, f3)
        self.mu.bias.data.uniform_(-f3, f3)

        self.optimizer = optim.Adam(self.parameters(), lr=alpha)
        self.device = T.device('cuda:0' if T.cuda.is_available() else 'cpu')

        self.to(self.device)

    def forward(self, state):
        x = self.fc1(state)
        x = self.bn1(x)
        x = F.relu(x)
        x = self.fc2(x)
        x = self.bn2(x)
        x = F.relu(x)
        x = self.fc3(x)
        x = self.bn3(x)
        x = F.relu(x)
        x = self.fc4(x)
        x = self.bn4(x)
        x = F.relu(x)
        x = T.tanh(self.mu(x))

        return x

    def save_checkpoint(self):
        print('... saving checkpoint ...')
        T.save(self.state_dict(), self.checkpoint_file)

    def load_checkpoint(self, load_file=''):
        print('... loading checkpoint ...')
        if T.cuda.is_available():
            self.load_state_dict(T.load(load_file))
        else:
            self.load_state_dict(T.load(load_file, map_location=T.device('cpu')))

class Agent(object):
    def __init__(self, alpha, beta, input_dims, tau, env, gamma=0.99,
                 n_actions=2, max_size=1000000, layer1_size=400,
                 layer2_size=300, layer3_size=256, layer4_size=128, batch_size=64, 
                 update_actor_interval=2, noise = 'AWGN', agent_name = 'default', load_file = ''):
        self.load_file = load_file
        self.layer1_size = layer1_size
        self.layer2_size = layer2_size
        self.layer3_size = layer3_size
        self.layer4_size = layer4_size
        self.gamma = gamma
        self.tau = tau
        self.memory = ReplayBuffer(max_size, input_dims, n_actions)
        self.batch_size = batch_size
        self.learn_step_cntr = 0
        self.update_actor_iter = update_actor_interval

        self.actor = ActorNetwork(alpha, input_dims, layer1_size,
                                  layer2_size, layer3_size, layer4_size, n_actions=n_actions,
                                  name='Actor_' + agent_name,chkpt_dir=env.data_manager.store_path )
        self.critic_1 = CriticNetwork(beta, input_dims, layer1_size,
                                    layer2_size, layer3_size, layer4_size, n_actions=n_actions,
                                    name='Critic_1_' + agent_name,chkpt_dir=env.data_manager.store_path)
        self.critic_2 = CriticNetwork(beta, input_dims, layer1_size,
                                    layer2_size, layer3_size, layer4_size, n_actions=n_actions,
                                    name='Critic_2_' + agent_name,chkpt_dir=env.data_manager.store_path)

        self.target_actor = ActorNetwork(alpha, input_dims, layer1_size,
                                         layer2_size, layer3_size, layer4_size, n_actions=n_actions,
                                         name='TargetActor_' + agent_name,chkpt_dir=env.data_manager.store_path)
        self.target_critic_1 = CriticNetwork(beta, input_dims, layer1_size,
                                           layer2_size, layer3_size, layer4_size, n_actions=n_actions,
                                           name='TargetCritic_1_' + agent_name,chkpt_dir=env.data_manager.store_path)
        self.target_critic_2 = CriticNetwork(beta, input_dims, layer1_size,
                                           layer2_size, layer3_size, layer4_size, n_actions=n_actions,
                                           name='TargetCritic_2_' + agent_name,chkpt_dir=env.data_manager.store_path)
        if noise == 'OU':
            self.noise = OUActionNoise(mu=np.zeros(n_actions))
        elif noise == 'AWGN':
            self.noise = AWGNActionNoise(mu = np.zeros(n_actions))
        # tau = 1 means copy parameters to target
        self.update_network_parameters(tau=1)

    def choose_action(self, observation, greedy=0.5, epsilon = 1):
        self.actor.eval()
        observation = T.tensor(observation, dtype=T.float).to(self.actor.device)
        mu = self.actor.forward(observation).to(self.actor.device)
        mu_prime = mu + T.tensor(greedy * self.noise(),
                                 dtype=T.float).to(self.actor.device)
        self.actor.train()
        return mu_prime.cpu().detach().numpy()


    def remember(self, state, action, reward, new_state, done):
        self.memory.store_transition(state, action, reward, new_state, done)

    def learn(self):
        if self.memory.mem_cntr < self.batch_size:
            return
        # the done here is opposite of the done in the env
        state, action, reward, new_state, done = \
                                      self.memory.sample_buffer(self.batch_size)

        # trun s, a, r, new_s into tensor
        reward = T.tensor(reward, dtype=T.float).to(self.critic_1.device)
        done = T.tensor(done).to(self.critic_1.device)
        new_state = T.tensor(new_state, dtype=T.float).to(self.critic_1.device)
        action = T.tensor(action, dtype=T.float).to(self.critic_1.device)
        state = T.tensor(state, dtype=T.float).to(self.critic_1.device)

        # trun on evaliation mode of target actor, target critic, critic net
        # fix these three nets
        self.target_actor.eval()
        self.target_critic_1.eval()
        self.target_critic_2.eval()
        self.critic_1.eval()
        self.critic_2.eval()

        
        target_actions = self.target_actor.forward(new_state)
        # target_actions = target_actions + \
        #         T.clamp(T.tensor(np.random.normal(scale=0.2)), -0.5, 0.5)
        # # might break if elements of min and max are not all equal
        # target_actions = T.clamp(target_actions, self.min_action[0], self.max_action[0])
        
        critic_value_1_ = self.target_critic_1.forward(new_state, target_actions)
        critic_value_2_ = self.target_critic_2.forward(new_state, target_actions)
        
        critic_value_1 = self.critic_1.forward(state, action)
        critic_value_2 = self.critic_2.forward(state, action)
        
        critic_value_ = T.min(critic_value_1_, critic_value_2_)
        
        target = []
        for j in range(self.batch_size):
            target.append(reward[j] + self.gamma*critic_value_[j]*done[j])
        target = T.tensor(target).to(self.critic_1.device)
        target = target.view(self.batch_size, 1)
        
        # here update the critic net using mse of (r + gamma * Q_argmax_a*(newstate, a*)) - Q(state, action)
        self.critic_1.train()
        self.critic_2.train()
        
        self.critic_1.optimizer.zero_grad()
        self.critic_2.optimizer.zero_grad()
        
        critic_1_loss = F.mse_loss(target, critic_value_1)
        critic_2_loss = F.mse_loss(target, critic_value_2)
        critic_loss = critic_1_loss + critic_2_loss
        critic_loss.backward()
        
        self.critic_1.optimizer.step()
        self.critic_2.optimizer.step()
        
        self.learn_step_cntr += 1
        
        if self.learn_step_cntr % self.update_actor_iter != 0:
            return

        # here update the actor net by policy gradient
        # first fix the critic net
        self.critic_1.eval()
        self.critic_2.eval()
        
        self.actor.optimizer.zero_grad()
        mu = self.actor.forward(state)
        self.actor.train()
        actor_q1_loss = self.critic_1.forward(state, mu)
        actor_loss = -T.mean(actor_q1_loss)
        actor_loss.backward()
        self.actor.optimizer.step()

        self.update_network_parameters()
        

    def update_network_parameters(self, tau=None):
        if tau is None:
            tau = self.tau

        actor_params = self.actor.named_parameters()
        critic_1_params = self.critic_1.named_parameters()
        critic_2_params = self.critic_2.named_parameters()
        target_actor_params = self.target_actor.named_parameters()
        target_critic_1_params = self.target_critic_1.named_parameters()
        target_critic_2_params = self.target_critic_2.named_parameters()

        critic_1_state_dict = dict(critic_1_params)
        critic_2_state_dict = dict(critic_2_params)
        actor_state_dict = dict(actor_params)
        target_actor_state_dict = dict(target_actor_params)
        target_critic_1_state_dict = dict(target_critic_1_params)
        target_critic_2_state_dict = dict(target_critic_2_params)

        for name in critic_1_state_dict:
            critic_1_state_dict[name] = tau*critic_1_state_dict[name].clone() + \
                    (1-tau)*target_critic_1_state_dict[name].clone()

        for name in critic_2_state_dict:
            critic_2_state_dict[name] = tau*critic_2_state_dict[name].clone() + \
                    (1-tau)*target_critic_2_state_dict[name].clone()

        for name in actor_state_dict:
            actor_state_dict[name] = tau*actor_state_dict[name].clone() + \
                    (1-tau)*target_actor_state_dict[name].clone()
                    
        self.target_critic_1.load_state_dict(critic_1_state_dict)
        self.target_critic_2.load_state_dict(critic_2_state_dict)
        self.target_actor.load_state_dict(actor_state_dict)


    def save_models(self):
        self.actor.save_checkpoint()
        self.target_actor.save_checkpoint()
        self.critic_1.save_checkpoint()
        self.critic_2.save_checkpoint()
        self.target_critic_1.save_checkpoint()
        self.target_critic_2.save_checkpoint()

    def load_models(self, load_file_actor = '',load_file_critic_1 ='',load_file_critic_2 =''):
        self.actor.load_checkpoint(load_file = load_file_actor)
        self.target_actor.load_checkpoint(load_file = load_file_actor)
        self.critic_1.load_checkpoint(load_file = load_file_critic_1)
        self.critic_2.load_checkpoint(load_file = load_file_critic_2)
        self.target_critic_1.load_checkpoint(load_file = load_file_critic_1)
        self.target_critic_2.load_checkpoint(load_file = load_file_critic_2)

Download .txt

gitextract_sya5mzu7/

├── README.md
├── batch_eval.sh
├── batch_train.sh
├── channel.py
├── data/
│   ├── .vscode/
│   │   └── settings.json
│   ├── data_test.py
│   ├── init_location.xlsx
│   ├── readme.md
│   ├── simulation_result.xlsx
│   └── storage/
│       ├── benchmark/
│       │   ├── ddpg_see_benchmark/
│       │   │   ├── Actor_UAV_ddpg
│       │   │   ├── Critic_UAV_ddpg
│       │   │   ├── TargetActor_UAV_ddpg
│       │   │   └── TargetCritic_UAV_ddpg
│       │   ├── ddpg_ssr_benchmark/
│       │   │   ├── Actor_UAV_ddpg
│       │   │   ├── Critic_UAV_ddpg
│       │   │   ├── TargetActor_UAV_ddpg
│       │   │   └── TargetCritic_UAV_ddpg
│       │   ├── td3_see_benchmark/
│       │   │   ├── Actor_UAV_TD3
│       │   │   ├── Critic_1_UAV_TD3
│       │   │   ├── Critic_2_UAV_TD3
│       │   │   ├── TargetActor_UAV_TD3
│       │   │   ├── TargetCritic_1_UAV_TD3
│       │   │   └── TargetCritic_2_UAV_TD3
│       │   └── td3_ssr_benchmark/
│       │       ├── Actor_UAV_TD3
│       │       ├── Critic_1_UAV_TD3
│       │       ├── Critic_2_UAV_TD3
│       │       ├── TargetActor_UAV_TD3
│       │       ├── TargetCritic_1_UAV_TD3
│       │       └── TargetCritic_2_UAV_TD3
│       └── readme.md
├── data_manager.py
├── ddpg.py
├── entity.py
├── env.py
├── env1.py
├── load_and_plot.py
├── main_RIS.py
├── main_ref.py
├── main_train.py
├── math_tool.py
├── plot_see.py
├── plot_ssr.py
├── plot_traj.py
├── render.py
├── requirements.txt
├── run_simulation.py
└── td3.py

Download .txt

SYMBOL INDEX (181 symbols across 13 files)

FILE: channel.py
  class mmWave_channel (line 6) | class mmWave_channel(object):
    method __init__ (line 12) | def __init__(self, transmitter, receiver, frequncy):
    method init_type (line 33) | def init_type(self):
    method get_channel_path_loss (line 70) | def get_channel_path_loss(self):
    method get_estimated_channel_matrix (line 81) | def get_estimated_channel_matrix(self):
    method generate_array_response (line 123) | def generate_array_response(self, transceiver, theta, fai):
    method update_CSI (line 155) | def update_CSI(self):

FILE: data_manager.py
  class DataManager (line 7) | class DataManager(object):
    method __init__ (line 13) | def __init__(self, store_list = ['beamforming_matrix', 'reflecting_coe...
    method save_file (line 34) | def save_file(self, episode_cnt = 10):
    method save_meta_data (line 45) | def save_meta_data(self, meta_dic):
    method init_format (line 51) | def init_format(self):
    method read_init_location (line 58) | def read_init_location(self, entity_type = 'user', index = 0):
    method store_data (line 67) | def store_data(self, row_data, value_name):

FILE: ddpg.py
  class OUActionNoise (line 9) | class OUActionNoise(object):
    method __init__ (line 10) | def __init__(self, mu, sigma=0.15, theta=.2, dt=1e-2, x0=None):
    method __call__ (line 18) | def __call__(self):
    method reset (line 24) | def reset(self):
    method __repr__ (line 27) | def __repr__(self):
  class AWGNActionNoise (line 31) | class AWGNActionNoise(object):
    method __init__ (line 32) | def __init__(self, mu = 0, sigma=1):
    method __call__ (line 36) | def __call__(self):
  class ReplayBuffer (line 42) | class ReplayBuffer(object):
    method __init__ (line 43) | def __init__(self, max_size, input_shape, n_actions):
    method store_transition (line 52) | def store_transition(self, state, action, reward, state_, done):
    method sample_buffer (line 61) | def sample_buffer(self, batch_size):
  class CriticNetwork (line 74) | class CriticNetwork(nn.Module):
    method __init__ (line 75) | def __init__(self, beta, input_dims, fc1_dims, fc2_dims, fc3_dims, fc4...
    method forward (line 136) | def forward(self, state, action):
    method save_checkpoint (line 155) | def save_checkpoint(self):
    method load_checkpoint (line 159) | def load_checkpoint(self,load_file = ''):
  class ActorNetwork (line 166) | class ActorNetwork(nn.Module):
    method __init__ (line 167) | def __init__(self, alpha, input_dims, fc1_dims, fc2_dims, fc3_dims, fc...
    method forward (line 224) | def forward(self, state):
    method save_checkpoint (line 241) | def save_checkpoint(self):
    method load_checkpoint (line 245) | def load_checkpoint(self, load_file=''):
  class Agent (line 252) | class Agent(object):
    method __init__ (line 253) | def __init__(self, alpha, beta, input_dims, tau, env, gamma=0.99,
    method choose_action (line 286) | def choose_action(self, observation, greedy=0.5, epsilon = 1):
    method remember (line 296) | def remember(self, state, action, reward, new_state, done):
    method learn (line 299) | def learn(self):
    method update_network_parameters (line 352) | def update_network_parameters(self, tau=None):
    method save_models (line 391) | def save_models(self):
    method load_models (line 397) | def load_models(self, load_file_actor = '',load_file_critic =''):
    method check_actor_params (line 403) | def check_actor_params(self):

FILE: entity.py
  class UAV (line 5) | class UAV(object):
    method __init__ (line 12) | def __init__(self, coordinate, index = 0, rotation = 0, ant_num=16, an...
    method reset (line 29) | def reset(self, coordinate):
    method update_coor_sys (line 35) | def update_coor_sys(self, delta_angle):
    method update_coordinate (line 51) | def update_coordinate(self, distance_delta_d, direction_fai):
    method move (line 60) | def move(self, distance_delta_d, direction_fai, delta_angle = 0):
  class RIS (line 67) | class RIS(object):
    method __init__ (line 73) | def __init__(self, coordinate, coor_sys_z, index = 0, ant_num=36, ant_...
  class User (line 93) | class User(object):
    method __init__ (line 97) | def __init__(self, coordinate, index, ant_num = 1, ant_type = 'single'):
    method reset (line 118) | def reset(self, coordinate):
    method update_coordinate (line 124) | def update_coordinate(self, distance_delta_d, direction_fai):
    method move (line 133) | def move(self, distance_delta_d, direction_fai):
  class Attacker (line 139) | class Attacker(object):
    method __init__ (line 143) | def __init__(self, coordinate, index, ant_num = 1, ant_type= 'single'):
    method reset (line 162) | def reset(self, coordinate):
    method update_coordinate (line 168) | def update_coordinate(self, distance_delta_d, direction_fai):
    method move (line 177) | def move(self, distance_delta_d, direction_fai):

FILE: env.py
  function get_energy_consumption (line 34) | def get_energy_consumption(v_t):
  class MiniSystem (line 57) | class MiniSystem(object):
    method __init__ (line 63) | def __init__(self, UAV_num = 1, RIS_num = 1, user_num = 1, attacker_nu...
    method reset (line 138) | def reset(self):
    method step (line 170) | def step(self, action_0 = 0, action_1 = 0, G = 0, Phi = 0, set_pos_x =...
    method reward (line 265) | def reward(self):
    method observe (line 287) | def observe(self):
    method store_current_system_sate (line 303) | def store_current_system_sate(self):
    method update_channel_capacity (line 338) | def update_channel_capacity(self):
    method calculate_comprehensive_channel_of_attacker_p (line 356) | def calculate_comprehensive_channel_of_attacker_p(self, p):
    method calculate_comprehensive_channel_of_user_k (line 366) | def calculate_comprehensive_channel_of_user_k(self, k):
    method calculate_capacity_of_user_k (line 376) | def calculate_capacity_of_user_k(self, k):
    method calculate_capacity_array_of_attacker_p (line 397) | def calculate_capacity_array_of_attacker_p(self, p):
    method calculate_secure_capacity_of_user_k (line 426) | def calculate_secure_capacity_of_user_k(self, k=2):
    method get_system_action_dim (line 436) | def get_system_action_dim(self):
    method get_system_state_dim (line 452) | def get_system_state_dim(self):

FILE: env1.py
  class minimal_IRS_system (line 6) | class minimal_IRS_system():
    method __init__ (line 7) | def __init__(self, BS_M = 8, IRS_N_x = 4, IRS_N_y =2, K = 8, statistic...
    method calculate_total_transmit_power (line 137) | def calculate_total_transmit_power(self):
    method dBm_to_W (line 148) | def dBm_to_W(self, dBm):
    method generate_noise (line 153) | def generate_noise(self, mu, segma, size = 1):
    method calculate_received_signal_amplitude (line 167) | def calculate_received_signal_amplitude(self):
    method calculate_SINR_of_kth_user (line 179) | def calculate_SINR_of_kth_user(self, k):
    method calculate_data_rate (line 195) | def calculate_data_rate(self):
    method calculate_theta_and_psi (line 205) | def calculate_theta_and_psi(self, ref_norm_vector, coordinate):
    method calculate_channel_BS_to_IRS (line 217) | def calculate_channel_BS_to_IRS(self, IRS): # IRS elements spacing hal...
    method calculate_channel_IRS_to_user (line 267) | def calculate_channel_IRS_to_user(self, IRS):
    method reset (line 292) | def reset(self):
    method apply_action (line 302) | def apply_action(self, action):
    method get_state (line 330) | def get_state(self):
    method step (line 407) | def step(self, action):
    method render (line 431) | def render(self):

FILE: load_and_plot.py
  function get_energy_consumption (line 45) | def get_energy_consumption(v_t):
  function read_init_location (line 70) | def read_init_location(entity_type = 'user', index = 0):
  class LoadAndPlot (line 81) | class LoadAndPlot(object):
    method __init__ (line 85) | def __init__(self, store_path, \
    method load_one_ep (line 100) | def load_one_ep(self, file_name):
    method load_all_steps (line 105) | def load_all_steps(self):
    method plot (line 150) | def plot(self):
    method plot_one_RIS_element (line 321) | def plot_one_RIS_element(self, index):
    method plot_trajectory (line 346) | def plot_trajectory(self):
    method restruct (line 434) | def restruct(self):

FILE: math_tool.py
  function cartesian_coordinate_to_spherical_coordinate (line 11) | def cartesian_coordinate_to_spherical_coordinate(cartesian_coordinate):
  function vecter_normalization (line 42) | def vecter_normalization(cartesian_coordinate):
  function get_coor_ref (line 45) | def get_coor_ref(coor_sys, coor):
  function dB_to_normal (line 56) | def dB_to_normal(dB):
  function normal_to_dB (line 63) | def normal_to_dB(normal):
  function diag_to_vector (line 70) | def diag_to_vector(diag):
  function vector_to_diag (line 80) | def vector_to_diag(vector):
  function bigger_than_zero (line 90) | def bigger_than_zero(value):
  function dataframe_to_dictionary (line 96) | def dataframe_to_dictionary(df):
  function convert_list_to_complex_matrix (line 102) | def convert_list_to_complex_matrix(list_real, shape):
  function convert_list_to_complex_diag (line 116) | def convert_list_to_complex_diag(list_real, diag_row_num):
  function map_to (line 127) | def map_to(x, x_range:tuple, y_range:tuple):

FILE: plot_see.py
  function get_energy_consumption (line 33) | def get_energy_consumption(v_t):
  function read_init_location (line 58) | def read_init_location(entity_type = 'user', index = 0):
  class LoadAndPlot (line 69) | class LoadAndPlot(object):
    method __init__ (line 73) | def __init__(self, store_paths, \
    method load_one_ep (line 88) | def load_one_ep(self, file_name):
    method load_all_steps (line 93) | def load_all_steps(self):
    method plot (line 138) | def plot(self):

FILE: plot_ssr.py
  function get_energy_consumption (line 33) | def get_energy_consumption(v_t):
  function read_init_location (line 58) | def read_init_location(entity_type = 'user', index = 0):
  class LoadAndPlot (line 69) | class LoadAndPlot(object):
    method __init__ (line 73) | def __init__(self, store_paths, \
    method load_one_ep (line 88) | def load_one_ep(self, file_name):
    method load_all_steps (line 93) | def load_all_steps(self):
    method plot (line 138) | def plot(self):

FILE: plot_traj.py
  function get_energy_consumption (line 33) | def get_energy_consumption(v_t):
  function read_init_location (line 58) | def read_init_location(entity_type = 'user', index = 0):
  class LoadAndPlot (line 69) | class LoadAndPlot(object):
    method __init__ (line 73) | def __init__(self, store_paths, \
    method load_one_ep (line 87) | def load_one_ep(self, file_name):
    method load_all_steps (line 92) | def load_all_steps(self):
    method plot (line 137) | def plot(self):

FILE: render.py
  class Arrow3D (line 7) | class Arrow3D(FancyArrowPatch):
    method __init__ (line 8) | def __init__(self, xs, ys, zs, *args, **kwargs):
    method draw (line 12) | def draw(self, renderer):
  class Render (line 18) | class Render(object):
    method __init__ (line 22) | def __init__(self, system, canv_x = (-25, 25), canv_y = (0, 50), canv_...
    method render_pause (line 32) | def render_pause(self):
    method render (line 47) | def render(self, interval):
    method plot_click (line 61) | def plot_click(self, event):
    method plot_config (line 64) | def plot_config(self):
    method plot_entities (line 77) | def plot_entities(self, ax):
    method plot_channels (line 122) | def plot_channels(self, ax):
    method plot_one_channel (line 137) | def plot_one_channel(self, ax, channel, color, text = "channel"):
    method plot_text (line 163) | def plot_text(self, ax):

FILE: td3.py
  class OUActionNoise (line 9) | class OUActionNoise(object):
    method __init__ (line 10) | def __init__(self, mu, sigma=0.15, theta=.2, dt=1e-2, x0=None):
    method __call__ (line 18) | def __call__(self):
    method reset (line 24) | def reset(self):
    method __repr__ (line 27) | def __repr__(self):
  class AWGNActionNoise (line 31) | class AWGNActionNoise(object):
    method __init__ (line 32) | def __init__(self, mu = 0, sigma=1):
    method __call__ (line 36) | def __call__(self):
  class ReplayBuffer (line 42) | class ReplayBuffer(object):
    method __init__ (line 43) | def __init__(self, max_size, input_shape, n_actions):
    method store_transition (line 52) | def store_transition(self, state, action, reward, state_, done):
    method sample_buffer (line 61) | def sample_buffer(self, batch_size):
  class CriticNetwork (line 74) | class CriticNetwork(nn.Module):
    method __init__ (line 75) | def __init__(self, beta, input_dims, fc1_dims, fc2_dims, fc3_dims, fc4...
    method forward (line 136) | def forward(self, state, action):
    method save_checkpoint (line 155) | def save_checkpoint(self):
    method load_checkpoint (line 159) | def load_checkpoint(self,load_file = ''):
  class ActorNetwork (line 166) | class ActorNetwork(nn.Module):
    method __init__ (line 167) | def __init__(self, alpha, input_dims, fc1_dims, fc2_dims, fc3_dims, fc...
    method forward (line 224) | def forward(self, state):
    method save_checkpoint (line 241) | def save_checkpoint(self):
    method load_checkpoint (line 245) | def load_checkpoint(self, load_file=''):
  class Agent (line 252) | class Agent(object):
    method __init__ (line 253) | def __init__(self, alpha, beta, input_dims, tau, env, gamma=0.99,
    method choose_action (line 295) | def choose_action(self, observation, greedy=0.5, epsilon = 1):
    method remember (line 305) | def remember(self, state, action, reward, new_state, done):
    method learn (line 308) | def learn(self):
    method update_network_parameters (line 387) | def update_network_parameters(self, tau=None):
    method save_models (line 422) | def save_models(self):
    method load_models (line 430) | def load_models(self, load_file_actor = '',load_file_critic_1 ='',load...

Download .json

Condensed preview — 47 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (192K chars).

[
  {
    "path": "README.md",
    "chars": 9829,
    "preview": "# Deep Reinforcement Learning for Secrecy Energy-Efficient UAV Communication with Reconfigurable Intelligent Surfaces\n\n*"
  },
  {
    "path": "batch_eval.sh",
    "chars": 1932,
    "preview": "#!/bin/bash\n\necho \n\necho ddpg_ssr\npython3 load_and_plot.py --path data/storage/trained_uav/ddpg_ssr --ep-num 300\necho dd"
  },
  {
    "path": "batch_train.sh",
    "chars": 1775,
    "preview": "#!/bin/bash\n\necho \n\necho ddpg_ssr\npython3 main_train.py --drl ddpg --reward ssr --ep-num 300 --trained-uav\necho ddpg_ssr"
  },
  {
    "path": "channel.py",
    "chars": 6795,
    "preview": "import numpy as np\nimport math\nimport cmath\nfrom math_tool import *\n\nclass mmWave_channel(object):\n    \"\"\"\n    generate "
  },
  {
    "path": "data/.vscode/settings.json",
    "chars": 70,
    "preview": "{\n    \"python.pythonPath\": \"C:\\\\Users\\\\67039\\\\anaconda3\\\\python.exe\"\n}"
  },
  {
    "path": "data/data_test.py",
    "chars": 129,
    "preview": "import pandas as pd \nuser_sheet = pd.read_excel('init_location.xlsx', sheet_name='user')\nuser_data = user_sheet\n\nprint(u"
  },
  {
    "path": "data/readme.md",
    "chars": 227,
    "preview": "#### init_location.xlsx\n- the config files for the initilization postion of each entity (such as UAV, users, attackers, "
  },
  {
    "path": "data/storage/readme.md",
    "chars": 256,
    "preview": "# This directory stores all experiments\n\n- Each experiment will be saved as a file project in this directory\n- benchmark"
  },
  {
    "path": "data_manager.py",
    "chars": 3051,
    "preview": "import numpy as np\nimport scipy.io\nimport pandas as pd\nimport os\nimport time, csv\n\nclass DataManager(object):\n    \"\"\"\n  "
  },
  {
    "path": "ddpg.py",
    "chars": 17900,
    "preview": "import os\nimport torch as T\n#import torch.cuda as T\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch.o"
  },
  {
    "path": "entity.py",
    "chars": 6191,
    "preview": "import numpy as np\nimport math\n#from math_tool import *\n\nclass UAV(object):\n    \"\"\"\n    UAV object with coordinate \n    "
  },
  {
    "path": "env.py",
    "chars": 19041,
    "preview": "#%matplotlib inline\nimport numpy as np\nfrom entity import *\nfrom channel import *\nfrom math_tool import *\nfrom datetime "
  },
  {
    "path": "env1.py",
    "chars": 20072,
    "preview": "import numpy as np\nimport math\nimport cmath\nnp.random.seed(42)\n\nclass minimal_IRS_system():\n    def __init__(self, BS_M "
  },
  {
    "path": "load_and_plot.py",
    "chars": 17003,
    "preview": "import matplotlib.pyplot as plt\nimport numpy as np\nimport cmath\nfrom scipy.io import loadmat, savemat\nimport pandas as p"
  },
  {
    "path": "main_RIS.py",
    "chars": 99,
    "preview": "from env import *\r\nepisode_num= 100\r\nstep_num = 100\r\na= int(5/5 * episode_num * step_num)\r\nprint(a)"
  },
  {
    "path": "main_ref.py",
    "chars": 2759,
    "preview": "from ddpg import Agent\nfrom env1 import minimal_IRS_system # orginal\nfrom env import MiniSystem\nimport numpy as np\n#from"
  },
  {
    "path": "main_train.py",
    "chars": 11108,
    "preview": "# debug field\nimport os\nos.environ[\"KMP_DUPLICATE_LIB_OK\"]=\"TRUE\"\n\nimport argparse\n\n# get argument from user\nparser = ar"
  },
  {
    "path": "math_tool.py",
    "chars": 3778,
    "preview": "import math\nimport cmath\nimport numpy as np\nimport pandas as pd\nfrom numpy import *\nfrom matplotlib import pyplot as plt"
  },
  {
    "path": "plot_see.py",
    "chars": 8485,
    "preview": "import matplotlib as mpl\nmpl.rcParams['figure.dpi'] = 300\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport cmat"
  },
  {
    "path": "plot_ssr.py",
    "chars": 7338,
    "preview": "import matplotlib as mpl\nmpl.rcParams['figure.dpi'] = 300\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport cmat"
  },
  {
    "path": "plot_traj.py",
    "chars": 11048,
    "preview": "import matplotlib as mpl\nmpl.rcParams['figure.dpi'] = 100\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport cmat"
  },
  {
    "path": "render.py",
    "chars": 5905,
    "preview": "from mpl_toolkits import mplot3d\nimport matplotlib.pyplot as plt\nfrom mpl_toolkits.mplot3d import Axes3D\nfrom matplotlib"
  },
  {
    "path": "requirements.txt",
    "chars": 317,
    "preview": "# Twin-TD3 requirements\n# Usage: pip install -r requirements.txt\n\nmatplotlib==3.4.3\npyparsing==3.0.9\n# sklearn==0.0.post"
  },
  {
    "path": "run_simulation.py",
    "chars": 10669,
    "preview": "# debug field\nimport os\nos.environ[\"KMP_DUPLICATE_LIB_OK\"]=\"TRUE\"\n\nimport argparse\n\n# get argument from user\nparser = ar"
  },
  {
    "path": "td3.py",
    "chars": 18835,
    "preview": "import os\nimport torch as T\n#import torch.cuda as T\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch.o"
  }
]

// ... and 22 more files (download for full content)

About this extraction

This page contains the full source code of the yjwong1999/Twin-TD3 GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 47 files (180.3 KB), approximately 49.4k tokens, and a symbol index with 181 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo