Repository: golsun/deep-RL-trading
Branch: master
Commit: 4109834e9178
Files: 19
Total size: 36.7 KB
Directory structure:
gitextract_vp_b1_jb/
├── .gitignore
├── LICENSE
├── README.md
├── data/
│ ├── PairSamplerDB/
│ │ ├── randjump_100,1(10, 30)[]_A/
│ │ │ ├── db.pickle
│ │ │ └── param.json
│ │ └── randjump_100,1(10, 30)[]_B/
│ │ ├── db.pickle
│ │ └── param.json
│ └── SinSamplerDB/
│ ├── concat_half_base_A/
│ │ ├── db.pickle
│ │ └── param.json
│ └── concat_half_base_B/
│ ├── db.pickle
│ └── param.json
├── env.yml
└── src/
├── agents.py
├── emulator.py
├── lib.py
├── main.py
├── sampler.py
├── simulators.py
└── visualizer.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitignore
================================================
*.pyc
================================================
FILE: LICENSE
================================================
MIT License
Copyright (c) 2018 Xiang Gao
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
================================================
FILE: README.md
================================================
# **Playing trading games with deep reinforcement learning**
This repo is the code for this [paper](https://arxiv.org/abs/1803.03916). Deep reinforcement learing is used to find optimal strategies in these two scenarios:
* Momentum trading: capture the underlying dynamics
* Arbitrage trading: utilize the hidden relation among the inputs
Several neural networks are compared:
* Recurrent Neural Networks (GRU/LSTM)
* Convolutional Neural Network (CNN)
* Multi-Layer Perception (MLP)
### Dependencies
You can get all dependencies via the [Anaconda](https://conda.io/docs/user-guide/tasks/manage-environments.html#creating-an-environment-from-an-environment-yml-file) environment file, [env.yml](https://github.com/golsun/deep-RL-time-series/blob/master/env.yml):
conda env create -f env.yml
### Play with it
Just call the main function
python main.py
You can play with model parameters (specified in main.py), if you get good results or any trouble, please contact me at gxiang1228@gmail.com
================================================
FILE: data/PairSamplerDB/randjump_100,1(10, 30)[]_A/param.json
================================================
{"n_episodes": 100, "title": "randjump(5, (10, 30), 1, [])", "window_episode": 180, "forecast_horizon_range": [10, 30], "max_change_perc": 30.0, "noise_level": 5, "n_section": 1, "n_var": 2}
================================================
FILE: data/PairSamplerDB/randjump_100,1(10, 30)[]_B/param.json
================================================
{"n_episodes": 100, "title": "randjump(5, (10, 30), 1, [])", "window_episode": 180, "forecast_horizon_range": [10, 30], "max_change_perc": 30.0, "noise_level": 5, "n_section": 1, "n_var": 2}
================================================
FILE: data/SinSamplerDB/concat_half_base_A/param.json
================================================
{"n_episodes": 100, "title": "ConcatHalfSin+Base(0.5, (10, 40), (5, 80))", "window_episode": 180, "noise_amplitude_ratio": 0.5, "period_range": [10, 40], "amplitude_range": [5, 80], "can_half_period": true}
================================================
FILE: data/SinSamplerDB/concat_half_base_B/param.json
================================================
{"n_episodes": 100, "title": "ConcatHalfSin+Base(0.5, (10, 40), (5, 80))", "window_episode": 180, "noise_amplitude_ratio": 0.5, "period_range": [10, 40], "amplitude_range": [5, 80], "can_half_period": true}
================================================
FILE: env.yml
================================================
name: drlts
channels:
- defaults
dependencies:
- ca-certificates=2018.03.07=0
- certifi=2018.4.16=py36_0
- h5py=2.7.1=py36h39cdac5_0
- hdf5=1.10.1=ha036c08_1
- intel-openmp=2018.0.0=8
- keras=2.1.5=py36_0
- libcxx=4.0.1=h579ed51_0
- libcxxabi=4.0.1=hebd6815_0
- libedit=3.1=hb4e282d_0
- libffi=3.2.1=h475c297_4
- libgfortran=3.0.1=h93005f0_2
- libprotobuf=3.5.2=h2cd40f5_0
- mkl=2018.0.2=1
- ncurses=6.0=hd04f020_2
- numpy=1.12.1=py36h8871d66_1
- openssl=1.0.2o=h26aff7b_0
- pandas=0.22.0=py36h0a44026_0
- pip=9.0.3=py36_0
- protobuf=3.5.2=py36h0a44026_0
- python=3.6.5=hc167b69_0
- python-dateutil=2.7.2=py36_0
- pytz=2018.4=py36_0
- pyyaml=3.12=py36h2ba1e63_1
- readline=7.0=hc1231fa_4
- scipy=1.0.1=py36hcaad992_0
- setuptools=39.0.1=py36_0
- six=1.11.0=py36h0e22d5e_1
- sqlite=3.23.1=hf1716c9_0
- tensorflow=1.1.0=np112py36_0
- tk=8.6.7=h35a86e2_3
- werkzeug=0.14.1=py36_0
- wheel=0.31.0=py36_0
- xz=5.2.3=h0278029_2
- yaml=0.1.7=hc338f04_2
- zlib=1.2.11=hf3cbc9b_2
================================================
FILE: src/agents.py
================================================
from lib import *
class Agent:
def __init__(self, model,
batch_size=32, discount_factor=0.95):
self.model = model
self.batch_size = batch_size
self.discount_factor = discount_factor
self.memory = []
def remember(self, state, action, reward, next_state, done, next_valid_actions):
self.memory.append((state, action, reward, next_state, done, next_valid_actions))
def replay(self):
batch = random.sample(self.memory, min(len(self.memory), self.batch_size))
for state, action, reward, next_state, done, next_valid_actions in batch:
q = reward
if not done:
q += self.discount_factor * np.nanmax(self.get_q_valid(next_state, next_valid_actions))
self.model.fit(state, action, q)
def get_q_valid(self, state, valid_actions):
q = self.model.predict(state)
q_valid = [np.nan] * len(q)
for action in valid_actions:
q_valid[action] = q[action]
return q_valid
def act(self, state, exploration, valid_actions):
if np.random.random() > exploration:
q_valid = self.get_q_valid(state, valid_actions)
if np.nanmin(q_valid) != np.nanmax(q_valid):
return np.nanargmax(q_valid)
return random.sample(valid_actions, 1)[0]
def save(self, fld):
makedirs(fld)
attr = {
'batch_size':self.batch_size,
'discount_factor':self.discount_factor,
#'memory':self.memory
}
pickle.dump(attr, open(os.path.join(fld, 'agent_attr.pickle'),'wb'))
self.model.save(fld)
def load(self, fld):
path = os.path.join(fld, 'agent_attr.pickle')
print(path)
attr = pickle.load(open(path,'rb'))
for k in attr:
setattr(self, k, attr[k])
self.model.load(fld)
def add_dim(x, shape):
return np.reshape(x, (1,) + shape)
class QModelKeras:
# ref: https://keon.io/deep-q-learning/
def init(self):
pass
def build_model(self):
pass
def __init__(self, state_shape, n_action):
self.state_shape = state_shape
self.n_action = n_action
self.attr2save = ['state_shape','n_action','model_name']
self.init()
def save(self, fld):
makedirs(fld)
with open(os.path.join(fld, 'model.json'), 'w') as json_file:
json_file.write(self.model.to_json())
self.model.save_weights(os.path.join(fld, 'weights.hdf5'))
attr = dict()
for a in self.attr2save:
attr[a] = getattr(self, a)
pickle.dump(attr, open(os.path.join(fld, 'Qmodel_attr.pickle'),'wb'))
def load(self, fld, learning_rate):
json_str = open(os.path.join(fld, 'model.json')).read()
self.model = keras.models.model_from_json(json_str)
self.model.load_weights(os.path.join(fld, 'weights.hdf5'))
self.model.compile(loss='mse', optimizer=keras.optimizers.Adam(lr=learning_rate))
attr = pickle.load(open(os.path.join(fld, 'Qmodel_attr.pickle'), 'rb'))
for a in attr:
setattr(self, a, attr[a])
def predict(self, state):
q = self.model.predict(
add_dim(state, self.state_shape)
)[0]
if np.isnan(max(q)):
print('state'+str(state))
print('q'+str(q))
raise ValueError
return q
def fit(self, state, action, q_action):
q = self.predict(state)
q[action] = q_action
self.model.fit(
add_dim(state, self.state_shape),
add_dim(q, (self.n_action,)),
epochs=1, verbose=0)
class QModelMLP(QModelKeras):
# multi-layer perception (MLP), i.e., dense only
def init(self):
self.qmodel = 'MLP'
def build_model(self, n_hidden, learning_rate, activation='relu'):
model = keras.models.Sequential()
model.add(keras.layers.Reshape(
(self.state_shape[0]*self.state_shape[1],),
input_shape=self.state_shape))
for i in range(len(n_hidden)):
model.add(keras.layers.Dense(n_hidden[i], activation=activation))
#model.add(keras.layers.Dropout(drop_rate))
model.add(keras.layers.Dense(self.n_action, activation='linear'))
model.compile(loss='mse', optimizer=keras.optimizers.Adam(lr=learning_rate))
self.model = model
self.model_name = self.qmodel + str(n_hidden)
class QModelRNN(QModelKeras):
"""
https://keras.io/getting-started/sequential-model-guide/#example
note param doesn't grow with len of sequence
"""
def _build_model(self, Layer, n_hidden, dense_units, learning_rate, activation='relu'):
model = keras.models.Sequential()
model.add(keras.layers.Reshape(self.state_shape, input_shape=self.state_shape))
m = len(n_hidden)
for i in range(m):
model.add(Layer(n_hidden[i],
return_sequences=(i<m-1)))
for i in range(len(dense_units)):
model.add(keras.layers.Dense(dense_units[i], activation=activation))
model.add(keras.layers.Dense(self.n_action, activation='linear'))
model.compile(loss='mse', optimizer=keras.optimizers.Adam(lr=learning_rate))
self.model = model
self.model_name = self.qmodel + str(n_hidden) + str(dense_units)
class QModelLSTM(QModelRNN):
def init(self):
self.qmodel = 'LSTM'
def build_model(self, n_hidden, dense_units, learning_rate, activation='relu'):
Layer = keras.layers.LSTM
self._build_model(Layer, n_hidden, dense_units, learning_rate, activation)
class QModelGRU(QModelRNN):
def init(self):
self.qmodel = 'GRU'
def build_model(self, n_hidden, dense_units, learning_rate, activation='relu'):
Layer = keras.layers.GRU
self._build_model(Layer, n_hidden, dense_units, learning_rate, activation)
class QModelConv(QModelKeras):
"""
ref: https://keras.io/layers/convolutional/
"""
def init(self):
self.qmodel = 'Conv'
def build_model(self,
filter_num, filter_size, dense_units,
learning_rate, activation='relu', dilation=None, use_pool=None):
if use_pool is None:
use_pool = [True]*len(filter_num)
if dilation is None:
dilation = [1]*len(filter_num)
model = keras.models.Sequential()
model.add(keras.layers.Reshape(self.state_shape, input_shape=self.state_shape))
for i in range(len(filter_num)):
model.add(keras.layers.Conv1D(filter_num[i], kernel_size=filter_size[i], dilation_rate=dilation[i],
activation=activation, use_bias=True))
if use_pool[i]:
model.add(keras.layers.MaxPooling1D(pool_size=2))
model.add(keras.layers.Flatten())
for i in range(len(dense_units)):
model.add(keras.layers.Dense(dense_units[i], activation=activation))
model.add(keras.layers.Dense(self.n_action, activation='linear'))
model.compile(loss='mse', optimizer=keras.optimizers.Adam(lr=learning_rate))
self.model = model
self.model_name = self.qmodel + str([a for a in
zip(filter_num, filter_size, dilation, use_pool)
])+' + '+str(dense_units)
class QModelConvRNN(QModelKeras):
"""
https://keras.io/getting-started/sequential-model-guide/#example
note param doesn't grow with len of sequence
"""
def _build_model(self, RNNLayer, conv_n_hidden, RNN_n_hidden, dense_units, learning_rate,
conv_kernel_size=3, use_pool=False, activation='relu'):
model = keras.models.Sequential()
model.add(keras.layers.Reshape(self.state_shape, input_shape=self.state_shape))
for i in range(len(conv_n_hidden)):
model.add(keras.layers.Conv1D(conv_n_hidden[i], kernel_size=conv_kernel_size,
activation=activation, use_bias=True))
if use_pool:
model.add(keras.layers.MaxPooling1D(pool_size=2))
m = len(RNN_n_hidden)
for i in range(m):
model.add(RNNLayer(RNN_n_hidden[i],
return_sequences=(i<m-1)))
for i in range(len(dense_units)):
model.add(keras.layers.Dense(dense_units[i], activation=activation))
model.add(keras.layers.Dense(self.n_action, activation='linear'))
model.compile(loss='mse', optimizer=keras.optimizers.Adam(lr=learning_rate))
self.model = model
self.model_name = self.qmodel + str(conv_n_hidden) + str(RNN_n_hidden) + str(dense_units)
class QModelConvLSTM(QModelConvRNN):
def init(self):
self.qmodel = 'ConvLSTM'
def build_model(self, conv_n_hidden, RNN_n_hidden, dense_units, learning_rate,
conv_kernel_size=3, use_pool=False, activation='relu'):
Layer = keras.layers.LSTM
self._build_model(Layer, conv_n_hidden, RNN_n_hidden, dense_units, learning_rate,
conv_kernel_size, use_pool, activation)
class QModelConvGRU(QModelConvRNN):
def init(self):
self.qmodel = 'ConvGRU'
def build_model(self, conv_n_hidden, RNN_n_hidden, dense_units, learning_rate,
conv_kernel_size=3, use_pool=False, activation='relu'):
Layer = keras.layers.GRU
self._build_model(Layer, conv_n_hidden, RNN_n_hidden, dense_units, learning_rate,
conv_kernel_size, use_pool, activation)
def load_model(fld, learning_rate):
s = open(os.path.join(fld,'QModel.txt'),'r').read().strip()
qmodels = {
'Conv':QModelConv,
'DenseOnly':QModelMLP,
'MLP':QModelMLP,
'LSTM':QModelLSTM,
'GRU':QModelGRU,
}
qmodel = qmodels[s](None, None)
qmodel.load(fld, learning_rate)
return qmodel
================================================
FILE: src/emulator.py
================================================
from lib import *
# by Xiang Gao, 2018
def find_ideal(p, just_once):
if not just_once:
diff = np.array(p[1:]) - np.array(p[:-1])
return sum(np.maximum(np.zeros(diff.shape), diff))
else:
best = 0.
i0_best = None
for i in range(len(p)-1):
best = max(best, max(p[i+1:]) - p[i])
return best
class Market:
"""
state MA of prices, normalized using values at t
ndarray of shape (window_state, n_instruments * n_MA), i.e., 2D
which is self.state_shape
action three action
0: empty, don't open/close.
1: open a position
2: keep a position
"""
def reset(self, rand_price=True):
self.empty = True
if rand_price:
prices, self.title = self.sampler.sample()
price = np.reshape(prices[:,0], prices.shape[0])
self.prices = prices.copy()
self.price = price/price[0]*100
self.t_max = len(self.price) - 1
self.max_profit = find_ideal(self.price[self.t0:], False)
self.t = self.t0
return self.get_state(), self.get_valid_actions()
def get_state(self, t=None):
if t is None:
t = self.t
state = self.prices[t - self.window_state + 1: t + 1, :].copy()
for i in range(self.sampler.n_var):
norm = np.mean(state[:,i])
state[:,i] = (state[:,i]/norm - 1.)*100
return state
def get_valid_actions(self):
if self.empty:
return [0, 1] # wait, open
else:
return [0, 2] # close, keep
def get_noncash_reward(self, t=None, empty=None):
if t is None:
t = self.t
if empty is None:
empty = self.empty
reward = self.direction * (self.price[t+1] - self.price[t])
if empty:
reward -= self.open_cost
if reward < 0:
reward *= (1. + self.risk_averse)
return reward
def step(self, action):
done = False
if action == 0: # wait/close
reward = 0.
self.empty = True
elif action == 1: # open
reward = self.get_noncash_reward()
self.empty = False
elif action == 2: # keep
reward = self.get_noncash_reward()
else:
raise ValueError('no such action: '+str(action))
self.t += 1
return self.get_state(), reward, self.t == self.t_max, self.get_valid_actions()
def __init__(self,
sampler, window_state, open_cost,
direction=1., risk_averse=0.):
self.sampler = sampler
self.window_state = window_state
self.open_cost = open_cost
self.direction = direction
self.risk_averse = risk_averse
self.n_action = 3
self.state_shape = (window_state, self.sampler.n_var)
self.action_labels = ['empty','open','keep']
self.t0 = window_state - 1
if __name__ == '__main__':
test_env()
================================================
FILE: src/lib.py
================================================
import random, os, datetime, pickle, json, keras, sys
import pandas as pd
#import matplotlib.pyplot as plt
import numpy as np
OUTPUT_FLD = os.path.join('..','results')
PRICE_FLD = '/Users/xianggao/Dropbox/distributed/code_db/price coinbase/vm-w7r-db'
def makedirs(fld):
if not os.path.exists(fld):
os.makedirs(fld)
================================================
FILE: src/main.py
================================================
#!/usr/bin/env python2
from lib import *
from sampler import *
from agents import *
from emulator import *
from simulators import *
from visualizer import *
def get_model(model_type, env, learning_rate, fld_load):
print_t = False
exploration_init = 1.
if model_type == 'MLP':
m = 16
layers = 5
hidden_size = [m]*layers
model = QModelMLP(env.state_shape, env.n_action)
model.build_model(hidden_size, learning_rate=learning_rate, activation='tanh')
elif model_type == 'conv':
m = 16
layers = 2
filter_num = [m]*layers
filter_size = [3] * len(filter_num)
#use_pool = [False, True, False, True]
#use_pool = [False, False, True, False, False, True]
use_pool = None
#dilation = [1,2,4,8]
dilation = None
dense_units = [48,24]
model = QModelConv(env.state_shape, env.n_action)
model.build_model(filter_num, filter_size, dense_units, learning_rate,
dilation=dilation, use_pool=use_pool)
elif model_type == 'RNN':
m = 32
layers = 3
hidden_size = [m]*layers
dense_units = [m,m]
model = QModelGRU(env.state_shape, env.n_action)
model.build_model(hidden_size, dense_units, learning_rate=learning_rate)
print_t = True
elif model_type == 'ConvRNN':
m = 8
conv_n_hidden = [m,m]
RNN_n_hidden = [m,m]
dense_units = [m,m]
model = QModelConvGRU(env.state_shape, env.n_action)
model.build_model(conv_n_hidden, RNN_n_hidden, dense_units, learning_rate=learning_rate)
print_t = True
elif model_type == 'pretrained':
agent.model = load_model(fld_load, learning_rate)
else:
raise ValueError
return model, print_t
def main():
"""
it is recommended to generate database usng sampler.py before run main
"""
model_type = 'conv'; exploration_init = 1.; fld_load = None
n_episode_training = 1000
n_episode_testing = 100
open_cost = 3.3
#db_type = 'SinSamplerDB'; db = 'concat_half_base_'; Sampler = SinSampler
db_type = 'PairSamplerDB'; db = 'randjump_100,1(10, 30)[]_'; Sampler = PairSampler
batch_size = 8
learning_rate = 1e-4
discount_factor = 0.8
exploration_decay = 0.99
exploration_min = 0.01
window_state = 40
fld = os.path.join('..','data',db_type,db+'A')
sampler = Sampler('load', fld=fld)
env = Market(sampler, window_state, open_cost)
model, print_t = get_model(model_type, env, learning_rate, fld_load)
model.model.summary()
#return
agent = Agent(model, discount_factor=discount_factor, batch_size=batch_size)
visualizer = Visualizer(env.action_labels)
fld_save = os.path.join(OUTPUT_FLD, sampler.title, model.model_name,
str((env.window_state, sampler.window_episode, agent.batch_size, learning_rate,
agent.discount_factor, exploration_decay, env.open_cost)))
print('='*20)
print(fld_save)
print('='*20)
simulator = Simulator(agent, env, visualizer=visualizer, fld_save=fld_save)
simulator.train(n_episode_training, save_per_episode=1, exploration_decay=exploration_decay,
exploration_min=exploration_min, print_t=print_t, exploration_init=exploration_init)
#agent.model = load_model(os.path.join(fld_save,'model'), learning_rate)
#print('='*20+'\nin-sample testing\n'+'='*20)
simulator.test(n_episode_testing, save_per_episode=1, subfld='in-sample testing')
"""
fld = os.path.join('data',db_type,db+'B')
sampler = SinSampler('load',fld=fld)
simulator.env.sampler = sampler
simulator.test(n_episode_testing, save_per_episode=1, subfld='out-of-sample testing')
"""
if __name__ == '__main__':
main()
================================================
FILE: src/sampler.py
================================================
from lib import *
def read_data(date, instrument, time_step):
path = os.path.join(PRICE_FLD, date, instrument+'.csv')
if not os.path.exists(path):
print('no such file: '+path)
return None
df_raw = pd.read_csv(path, parse_dates=['time'], index_col='time')
df = df_raw.resample(time_step, how='last').fillna(method='ffill')
return df['spot'].values
class Sampler:
def load_db(self, fld):
self.db = pickle.load(open(os.path.join(fld, 'db.pickle'),'rb'))
param = json.load(open(os.path.join(fld, 'param.json'),'rb'))
self.i_db = 0
self.n_db = param['n_episodes']
self.sample = self.__sample_db
for attr in param:
if hasattr(self, attr):
setattr(self, attr, param[attr])
self.title = 'DB_'+param['title']
def build_db(self, n_episodes, fld):
db = []
for i in range(n_episodes):
prices, title = self.sample()
db.append((prices, '[%i]_'%i+title))
os.makedirs(fld) # don't overwrite existing fld
pickle.dump(db, open(os.path.join(fld, 'db.pickle'),'wb'))
param = {'n_episodes':n_episodes}
for k in self.attrs:
param[k] = getattr(self, k)
json.dump(param, open(os.path.join(fld, 'param.json'),'w'))
def __sample_db(self):
prices, title = self.db[self.i_db]
self.i_db += 1
if self.i_db == self.n_db:
self.i_db = 0
return prices, title
class PairSampler(Sampler):
def __init__(self, game,
window_episode=None, forecast_horizon_range=None, max_change_perc=10., noise_level=10., n_section=1,
fld=None, windows_transform=[]):
self.window_episode = window_episode
self.forecast_horizon_range = forecast_horizon_range
self.max_change_perc = max_change_perc
self.noise_level = noise_level
self.n_section = n_section
self.windows_transform = windows_transform
self.n_var = 2 + len(self.windows_transform) # price, signal
self.attrs = ['title', 'window_episode', 'forecast_horizon_range',
'max_change_perc', 'noise_level', 'n_section', 'n_var']
param_str = str((self.noise_level, self.forecast_horizon_range, self.n_section, self.windows_transform))
if game == 'load':
self.load_db(fld)
elif game in ['randwalk','randjump']:
self.__rand = getattr(self, '_PairSampler__'+game)
self.sample = self.__sample
self.title = game + param_str
else:
raise ValueError
def __randwalk(self, l):
change = (np.random.random(l + self.forecast_horizon_range[1]) - 0.5) * 2 * self.max_change_perc/100
forecast_horizon = random.randrange(self.forecast_horizon_range[0], self.forecast_horizon_range[1])
return change[:l], change[forecast_horizon: forecast_horizon + l], forecast_horizon
def __randjump(self, l):
change = [0.] * (l + self.forecast_horizon_range[1])
n_jump = random.randrange(15,30)
for i in range(n_jump):
t = random.randrange(len(change))
change[t] = (np.random.random() - 0.5) * 2 * self.max_change_perc/100
forecast_horizon = random.randrange(self.forecast_horizon_range[0], self.forecast_horizon_range[1])
return change[:l], change[forecast_horizon: forecast_horizon + l], forecast_horizon
def __sample(self):
L = self.window_episode
if bool(self.windows_transform):
L += max(self.windows_transform)
l0 = L/self.n_section
l1 = L
d_price = []
d_signal = []
forecast_horizon = []
for i in range(self.n_section):
if i == self.n_section - 1:
l = l1
else:
l = l0
l1 -= l0
d_price_i, d_signal_i, horizon_i = self.__rand(l)
d_price = np.append(d_price, d_price_i)
d_signal = np.append(d_signal, d_signal_i)
forecast_horizon.append(horizon_i)
price = 100. * (1. + np.cumsum(d_price))
signal = 100. * (1. + np.cumsum(d_signal)) + \
np.random.random(len(price)) * self.noise_level
price += (100 - min(price))
signal += (100 - min(signal))
inputs = [price[-self.window_episode:], signal[-self.window_episode:]]
for w in self.windows_transform:
inputs.append(signal[-self.window_episode - w: -w])
return np.array(inputs).T, 'forecast_horizon='+str(forecast_horizon)
class SinSampler(Sampler):
def __init__(self, game,
window_episode=None, noise_amplitude_ratio=None, period_range=None, amplitude_range=None,
fld=None):
self.n_var = 1 # price only
self.window_episode = window_episode
self.noise_amplitude_ratio = noise_amplitude_ratio
self.period_range = period_range
self.amplitude_range = amplitude_range
self.can_half_period = False
self.attrs = ['title','window_episode', 'noise_amplitude_ratio', 'period_range', 'amplitude_range', 'can_half_period']
param_str = str((
self.noise_amplitude_ratio, self.period_range, self.amplitude_range
))
if game == 'single':
self.sample = self.__sample_single_sin
self.title = 'SingleSin'+param_str
elif game == 'concat':
self.sample = self.__sample_concat_sin
self.title = 'ConcatSin'+param_str
elif game == 'concat_half':
self.can_half_period = True
self.sample = self.__sample_concat_sin
self.title = 'ConcatHalfSin'+param_str
elif game == 'concat_half_base':
self.can_half_period = True
self.sample = self.__sample_concat_sin_w_base
self.title = 'ConcatHalfSin+Base'+param_str
self.base_period_range = (int(2*self.period_range[1]), 4*self.period_range[1])
self.base_amplitude_range = (20,80)
elif game == 'load':
self.load_db(fld)
else:
raise ValueError
def __rand_sin(self,
period_range=None, amplitude_range=None, noise_amplitude_ratio=None, full_episode=False):
if period_range is None:
period_range = self.period_range
if amplitude_range is None:
amplitude_range = self.amplitude_range
if noise_amplitude_ratio is None:
noise_amplitude_ratio = self.noise_amplitude_ratio
period = random.randrange(period_range[0], period_range[1])
amplitude = random.randrange(amplitude_range[0], amplitude_range[1])
noise = noise_amplitude_ratio * amplitude
if full_episode:
length = self.window_episode
else:
if self.can_half_period:
length = int(random.randrange(1,4) * 0.5 * period)
else:
length = period
p = 100. + amplitude * np.sin(np.array(range(length)) * 2 * 3.1416 / period)
p += np.random.random(p.shape) * noise
return p, '100+%isin((2pi/%i)t)+%ie'%(amplitude, period, noise)
def __sample_concat_sin(self):
prices = []
p = []
while True:
p = np.append(p, self.__rand_sin(full_episode=False)[0])
if len(p) > self.window_episode:
break
prices.append(p[:self.window_episode])
return np.array(prices).T, 'concat sin'
def __sample_concat_sin_w_base(self):
prices = []
p = []
while True:
p = np.append(p, self.__rand_sin(full_episode=False)[0])
if len(p) > self.window_episode:
break
base, base_title = self.__rand_sin(
period_range=self.base_period_range,
amplitude_range=self.base_amplitude_range,
noise_amplitude_ratio=0.,
full_episode=True)
prices.append(p[:self.window_episode] + base)
return np.array(prices).T, 'concat sin + base: '+base_title
def __sample_single_sin(self):
prices = []
funcs = []
p, func = self.__rand_sin(full_episode=True)
prices.append(p)
funcs.append(func)
return np.array(prices).T, str(funcs)
def test_SinSampler():
window_episode = 180
window_state = 40
noise_amplitude_ratio = 0.5
period_range = (10,40)
amplitude_range = (5,80)
game = 'concat_half_base'
instruments = ['fake']
sampler = SinSampler(game,
window_episode, noise_amplitude_ratio, period_range, amplitude_range)
n_episodes = 100
"""
for i in range(100):
plt.plot(sampler.sample(instruments)[0])
plt.show()
"""
fld = os.path.join('data','SinSamplerDB',game+'_B')
sampler.build_db(n_episodes, fld)
def test_PairSampler():
fhr = (10,30)
n_section = 1
max_change_perc = 30.
noise_level = 5
game = 'randjump'
windows_transform = []
sampler = PairSampler(game, window_episode=180, forecast_horizon_range=fhr,
n_section=n_section, noise_level=noise_level, max_change_perc=max_change_perc, windows_transform=windows_transform)
#plt.plot(sampler.sample()[0]);plt.show()
#"""
n_episodes = 100
fld = os.path.join('data','PairSamplerDB',
game+'_%i,%i'%(n_episodes, n_section)+str(fhr)+str(windows_transform)+'_B')
sampler.build_db(n_episodes, fld)
#"""
if __name__ == '__main__':
#scan_match()
test_SinSampler()
#p = [1,2,3,2,1,2,3]
#print find_ideal(p)
test_PairSampler()
================================================
FILE: src/simulators.py
================================================
from lib import *
class Simulator:
def play_one_episode(self, exploration, training=True, rand_price=True, print_t=False):
state, valid_actions = self.env.reset(rand_price=rand_price)
done = False
env_t = 0
try:
env_t = self.env.t
except AttributeError:
pass
cum_rewards = [np.nan] * env_t
actions = [np.nan] * env_t
states = [None] * env_t
prev_cum_rewards = 0.
while not done:
if print_t:
print(self.env.t)
action = self.agent.act(state, exploration, valid_actions)
next_state, reward, done, valid_actions = self.env.step(action)
cum_rewards.append(prev_cum_rewards+reward)
prev_cum_rewards = cum_rewards[-1]
actions.append(action)
states.append(next_state)
if training:
self.agent.remember(state, action, reward, next_state, done, valid_actions)
self.agent.replay()
state = next_state
return cum_rewards, actions, states
def train(self, n_episode,
save_per_episode=10, exploration_decay=0.995, exploration_min=0.01, print_t=False, exploration_init=1.):
fld_model = os.path.join(self.fld_save,'model')
makedirs(fld_model) # don't overwrite if already exists
with open(os.path.join(fld_model,'QModel.txt'),'w') as f:
f.write(self.agent.model.qmodel)
exploration = exploration_init
fld_save = os.path.join(self.fld_save,'training')
makedirs(fld_save)
MA_window = 100 # MA of performance
safe_total_rewards = []
explored_total_rewards = []
explorations = []
path_record = os.path.join(fld_save,'record.csv')
with open(path_record,'w') as f:
f.write('episode,game,exploration,explored,safe,MA_explored,MA_safe\n')
for n in range(n_episode):
print('\ntraining...')
exploration = max(exploration_min, exploration * exploration_decay)
explorations.append(exploration)
explored_cum_rewards, explored_actions, _ = self.play_one_episode(exploration, print_t=print_t)
explored_total_rewards.append(100.*explored_cum_rewards[-1]/self.env.max_profit)
safe_cum_rewards, safe_actions, _ = self.play_one_episode(0, training=False, rand_price=False, print_t=False)
safe_total_rewards.append(100.*safe_cum_rewards[-1]/self.env.max_profit)
MA_total_rewards = np.median(explored_total_rewards[-MA_window:])
MA_safe_total_rewards = np.median(safe_total_rewards[-MA_window:])
ss = [
str(n), self.env.title.replace(',',';'), '%.1f'%(exploration*100.),
'%.1f'%(explored_total_rewards[-1]), '%.1f'%(safe_total_rewards[-1]),
'%.1f'%MA_total_rewards, '%.1f'%MA_safe_total_rewards,
]
with open(path_record,'a') as f:
f.write(','.join(ss)+'\n')
print('\t'.join(ss))
if n%save_per_episode == 0:
print('saving results...')
self.agent.save(fld_model)
"""
self.visualizer.plot_a_episode(
self.env, self.agent.model,
explored_cum_rewards, explored_actions,
safe_cum_rewards, safe_actions,
os.path.join(fld_save, 'episode_%i.png'%(n)))
self.visualizer.plot_episodes(
explored_total_rewards, safe_total_rewards, explorations,
os.path.join(fld_save, 'total_rewards.png'),
MA_window)
"""
def test(self, n_episode, save_per_episode=10, subfld='testing'):
fld_save = os.path.join(self.fld_save, subfld)
makedirs(fld_save)
MA_window = 100 # MA of performance
safe_total_rewards = []
path_record = os.path.join(fld_save,'record.csv')
with open(path_record,'w') as f:
f.write('episode,game,pnl,rel,MA\n')
for n in range(n_episode):
print('\ntesting...')
safe_cum_rewards, safe_actions, _ = self.play_one_episode(0, training=False, rand_price=True)
safe_total_rewards.append(100.*safe_cum_rewards[-1]/self.env.max_profit)
MA_safe_total_rewards = np.median(safe_total_rewards[-MA_window:])
ss = [str(n), self.env.title.replace(',',';'),
'%.1f'%(safe_cum_rewards[-1]),
'%.1f'%(safe_total_rewards[-1]),
'%.1f'%MA_safe_total_rewards]
with open(path_record,'a') as f:
f.write(','.join(ss)+'\n')
print('\t'.join(ss))
if n%save_per_episode == 0:
print('saving results...')
"""
self.visualizer.plot_a_episode(
self.env, self.agent.model,
[np.nan]*len(safe_cum_rewards), [np.nan]*len(safe_actions),
safe_cum_rewards, safe_actions,
os.path.join(fld_save, 'episode_%i.png'%(n)))
self.visualizer.plot_episodes(
None, safe_total_rewards, None,
os.path.join(fld_save, 'total_rewards.png'),
MA_window)
"""
def __init__(self, agent, env,
visualizer, fld_save):
self.agent = agent
self.env = env
self.visualizer = visualizer
self.fld_save = fld_save
if __name__ == '__main__':
#print 'episode%i, init%i'%(1,2)
a = [1,2,3]
print(np.mean(a[-100:]))
================================================
FILE: src/visualizer.py
================================================
from lib import *
def get_tick_labels(bins, ticks):
ticklabels = []
for i in ticks:
if i < len(bins):
ticklabels.append('%.2f'%(bins[int(i)]))
else:
ticklabels.append('%.2f'%(bins[-1])+'+')
return ticklabels
class Visualizer:
def __init__(self, action_labels):
self.n_action = len(action_labels)
self.action_labels = action_labels
def plot_a_episode(self,
env, model,
explored_cum_rewards, explored_actions,
safe_cum_rewards, safe_actions,
fig_path):
f, axs = plt.subplots(3,1,sharex=True, figsize=(14,14))
ax_price, ax_action, ax_Q = axs
ls = ['-','--']
for i in range(min(2,env.prices.shape[1])):
p = env.prices[:,i]/env.prices[0,i]*100 - 100
ax_price.plot(p, 'k'+ls[i], label='input%i - 100'%i)
ax_price.plot(explored_cum_rewards, 'b', label='explored P&L')
ax_price.plot(safe_cum_rewards, 'r', label='safe P&L')
ax_price.legend(loc='best', frameon=False)
ax_price.set_title(env.title+', ideal: %.1f, safe: %.1f, explored: %1.f'%(
env.max_profit, safe_cum_rewards[-1], explored_cum_rewards[-1]))
ax_action.plot(explored_actions, 'b', label='explored')
ax_action.plot(safe_actions, 'r', label='safe', linewidth=2)
ax_action.set_ylim(-0.4, self.n_action-0.6)
ax_action.set_ylabel('action')
ax_action.set_yticks(range(self.n_action))
ax_action.legend(loc='best', frameon=False)
style = ['k','r','b']
qq = []
for t in xrange(env.t0):
qq.append([np.nan] * self.n_action)
for t in xrange(env.t0, env.t_max):
qq.append(model.predict(env.get_state(t)))
for i in xrange(self.n_action):
ax_Q.plot([float(qq[t][i]) for t in xrange(len(qq))],
style[i], label=self.action_labels[i])
ax_Q.set_ylabel('Q')
ax_Q.legend(loc='best', frameon=False)
ax_Q.set_xlabel('t')
plt.subplots_adjust(wspace=0.4)
plt.savefig(fig_path)
plt.close()
def plot_episodes(self,
explored_total_rewards, safe_total_rewards, explorations,
fig_path, MA_window=100):
f = plt.figure(figsize=(14,10)) # width, height in inch (100 pixel)
if explored_total_rewards is None:
f, ax_reward = plt.subplots()
else:
figshape = (3,1)
ax_reward = plt.subplot2grid(figshape, (0, 0), rowspan=2)
ax_exploration = plt.subplot2grid(figshape, (2, 0), sharex=ax_reward)
tt = range(len(safe_total_rewards))
if explored_total_rewards is not None:
ma = pd.rolling_median(np.array(explored_total_rewards), window=MA_window, min_periods=1)
std = pd.rolling_std(np.array(explored_total_rewards), window=MA_window, min_periods=3)
ax_reward.plot(tt, explored_total_rewards,'bv', fillstyle='none')
ax_reward.plot(tt, ma, 'b', label='explored ma', linewidth=2)
ax_reward.plot(tt, std, 'b--', label='explored std', linewidth=2)
ma = pd.rolling_median(np.array(safe_total_rewards), window=MA_window, min_periods=1)
std = pd.rolling_std(np.array(safe_total_rewards), window=MA_window, min_periods=3)
ax_reward.plot(tt, safe_total_rewards,'ro', fillstyle='none')
ax_reward.plot(tt, ma,'r', label='safe ma', linewidth=2)
ax_reward.plot(tt, std,'r--', label='safe std', linewidth=2)
ax_reward.axhline(y=0, color='k', linestyle=':')
#ax_reward.axhline(y=60, color='k', linestyle=':')
ax_reward.set_ylabel('total reward')
ax_reward.legend(loc='best', frameon=False)
ax_reward.yaxis.tick_right()
ylim = ax_reward.get_ylim()
ax_reward.set_ylim((max(-100,ylim[0]), min(100,ylim[1])))
if explored_total_rewards is not None:
ax_exploration.plot(tt, np.array(explorations)*100., 'k')
ax_exploration.set_ylabel('exploration')
ax_exploration.set_xlabel('episode')
plt.savefig(fig_path)
plt.close()
def test_visualizer():
f = plt.figure()#figsize=(5,8))
axs_action = []
ncol = 3
nrow = 2
clim = (0,1)
ax = plt.subplot2grid((nrow, ncol), (0,ncol-1))
ax.matshow(np.random.random((2,2)), cmap='RdYlBu_r', clim=clim)
for action in range(3):
row = 1 + action/ncol
col = action%ncol
ax = plt.subplot2grid((nrow, ncol), (row,col))
cax = ax.matshow(np.random.random((2,2)), cmap='RdYlBu_r', clim=clim)
ax = plt.subplot2grid((nrow, ncol), (0,0), colspan=ncol-1)
cbar = f.colorbar(cax, ax=ax)
plt.show()
class VisualizerSequential:
def config(self):
pass
def __init__(self, model):
self.model = model
self.layers = []
for layer in self.model.layers:
self.layers.append(str(layer.name))
self.inter_models = dict()
model_input = self.model.input
for layer in self.layers:
self.inter_models[layer] = keras.models.Model(
inputs=model_input,
outputs=self.model.get_layer(layer).output)
self.config()
class VisualizerConv1D(VisualizerSequential):
def config(self):
self.n_channel = self.model.input.shape[2]
n_col = self.n_channel
for layer in self.layers:
shape = self.inter_models[layer].output.shape
if len(shape) == 3:
n_col = max(n_col, shape[2])
self.figshape = (len(self.layers)+1, int(n_col))
def plot(self, x):
f = plt.figure(figsize=(30,30))
for i in range(self.n_channel):
ax = plt.subplot2grid(self.figshape, (0,i))
ax.plot(x[0,:,i], '.-')
ax.set_title('input, channel %i'%i)
for i_layer in range(len(self.layers)):
layer = self.layers[i_layer]
z = self.inter_models[layer].predict(x)
print('plotting '+layer)
if len(z.shape) == 3:
for i in range(z.shape[2]):
ax = plt.subplot2grid(self.figshape, (i_layer+1, i))
ax.plot(z[0,:,i], '.-')
ax.set_title(layer+' filter %i'%i)
else:
ax = plt.subplot2grid(self.figshape, (i_layer+1, 0))
ax.plot(z[0,:], '.-')
ax.set_title(layer)
ax.set_ylim(-100,100)
def print_w(self):
layer = self.layers[0]
ww = self.inter_models[layer].get_weights()
for w in ww:
print(w.shape)
print(w)
gitextract_vp_b1_jb/
├── .gitignore
├── LICENSE
├── README.md
├── data/
│ ├── PairSamplerDB/
│ │ ├── randjump_100,1(10, 30)[]_A/
│ │ │ ├── db.pickle
│ │ │ └── param.json
│ │ └── randjump_100,1(10, 30)[]_B/
│ │ ├── db.pickle
│ │ └── param.json
│ └── SinSamplerDB/
│ ├── concat_half_base_A/
│ │ ├── db.pickle
│ │ └── param.json
│ └── concat_half_base_B/
│ ├── db.pickle
│ └── param.json
├── env.yml
└── src/
├── agents.py
├── emulator.py
├── lib.py
├── main.py
├── sampler.py
├── simulators.py
└── visualizer.py
SYMBOL INDEX (87 symbols across 7 files)
FILE: src/agents.py
class Agent (line 3) | class Agent:
method __init__ (line 5) | def __init__(self, model,
method remember (line 14) | def remember(self, state, action, reward, next_state, done, next_valid...
method replay (line 18) | def replay(self):
method get_q_valid (line 27) | def get_q_valid(self, state, valid_actions):
method act (line 35) | def act(self, state, exploration, valid_actions):
method save (line 43) | def save(self, fld):
method load (line 55) | def load(self, fld):
function add_dim (line 64) | def add_dim(x, shape):
class QModelKeras (line 69) | class QModelKeras:
method init (line 72) | def init(self):
method build_model (line 75) | def build_model(self):
method __init__ (line 78) | def __init__(self, state_shape, n_action):
method save (line 85) | def save(self, fld):
method load (line 96) | def load(self, fld, learning_rate):
method predict (line 106) | def predict(self, state):
method fit (line 118) | def fit(self, state, action, q_action):
class QModelMLP (line 129) | class QModelMLP(QModelKeras):
method init (line 132) | def init(self):
method build_model (line 135) | def build_model(self, n_hidden, learning_rate, activation='relu'):
class QModelRNN (line 153) | class QModelRNN(QModelKeras):
method _build_model (line 159) | def _build_model(self, Layer, n_hidden, dense_units, learning_rate, ac...
class QModelLSTM (line 176) | class QModelLSTM(QModelRNN):
method init (line 177) | def init(self):
method build_model (line 179) | def build_model(self, n_hidden, dense_units, learning_rate, activation...
class QModelGRU (line 184) | class QModelGRU(QModelRNN):
method init (line 185) | def init(self):
method build_model (line 187) | def build_model(self, n_hidden, dense_units, learning_rate, activation...
class QModelConv (line 193) | class QModelConv(QModelKeras):
method init (line 197) | def init(self):
method build_model (line 200) | def build_model(self,
class QModelConvRNN (line 232) | class QModelConvRNN(QModelKeras):
method _build_model (line 238) | def _build_model(self, RNNLayer, conv_n_hidden, RNN_n_hidden, dense_un...
class QModelConvLSTM (line 262) | class QModelConvLSTM(QModelConvRNN):
method init (line 263) | def init(self):
method build_model (line 265) | def build_model(self, conv_n_hidden, RNN_n_hidden, dense_units, learni...
class QModelConvGRU (line 272) | class QModelConvGRU(QModelConvRNN):
method init (line 273) | def init(self):
method build_model (line 275) | def build_model(self, conv_n_hidden, RNN_n_hidden, dense_units, learni...
function load_model (line 287) | def load_model(fld, learning_rate):
FILE: src/emulator.py
function find_ideal (line 7) | def find_ideal(p, just_once):
class Market (line 20) | class Market:
method reset (line 32) | def reset(self, rand_price=True):
method get_state (line 47) | def get_state(self, t=None):
method get_valid_actions (line 56) | def get_valid_actions(self):
method get_noncash_reward (line 63) | def get_noncash_reward(self, t=None, empty=None):
method step (line 76) | def step(self, action):
method __init__ (line 94) | def __init__(self,
FILE: src/lib.py
function makedirs (line 9) | def makedirs(fld):
FILE: src/main.py
function get_model (line 11) | def get_model(model_type, env, learning_rate, fld_load):
function main (line 68) | def main():
FILE: src/sampler.py
function read_data (line 3) | def read_data(date, instrument, time_step):
class Sampler (line 15) | class Sampler:
method load_db (line 17) | def load_db(self, fld):
method build_db (line 30) | def build_db(self, n_episodes, fld):
method __sample_db (line 43) | def __sample_db(self):
class PairSampler (line 52) | class PairSampler(Sampler):
method __init__ (line 54) | def __init__(self, game,
method __randwalk (line 80) | def __randwalk(self, l):
method __randjump (line 86) | def __randjump(self, l):
method __sample (line 97) | def __sample(self):
class SinSampler (line 136) | class SinSampler(Sampler):
method __init__ (line 138) | def __init__(self, game,
method __rand_sin (line 177) | def __rand_sin(self,
method __sample_concat_sin (line 207) | def __sample_concat_sin(self):
method __sample_concat_sin_w_base (line 217) | def __sample_concat_sin_w_base(self):
method __sample_single_sin (line 232) | def __sample_single_sin(self):
function test_SinSampler (line 244) | def test_SinSampler():
function test_PairSampler (line 267) | def test_PairSampler():
FILE: src/simulators.py
class Simulator (line 5) | class Simulator:
method play_one_episode (line 7) | def play_one_episode(self, exploration, training=True, rand_price=True...
method train (line 44) | def train(self, n_episode,
method test (line 109) | def test(self, n_episode, save_per_episode=10, subfld='testing'):
method __init__ (line 155) | def __init__(self, agent, env,
FILE: src/visualizer.py
function get_tick_labels (line 5) | def get_tick_labels(bins, ticks):
class Visualizer (line 18) | class Visualizer:
method __init__ (line 20) | def __init__(self, action_labels):
method plot_a_episode (line 25) | def plot_a_episode(self,
method plot_episodes (line 71) | def plot_episodes(self,
function test_visualizer (line 117) | def test_visualizer():
class VisualizerSequential (line 144) | class VisualizerSequential:
method config (line 146) | def config(self):
method __init__ (line 149) | def __init__(self, model):
class VisualizerConv1D (line 165) | class VisualizerConv1D(VisualizerSequential):
method config (line 167) | def config(self):
method plot (line 179) | def plot(self, x):
method print_w (line 206) | def print_w(self):
Condensed preview — 19 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (42K chars).
[
{
"path": ".gitignore",
"chars": 7,
"preview": "\n*.pyc\n"
},
{
"path": "LICENSE",
"chars": 1066,
"preview": "MIT License\n\nCopyright (c) 2018 Xiang Gao\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\n"
},
{
"path": "README.md",
"chars": 1010,
"preview": "\n# **Playing trading games with deep reinforcement learning**\n\nThis repo is the code for this [paper](https://arxiv.org/"
},
{
"path": "data/PairSamplerDB/randjump_100,1(10, 30)[]_A/param.json",
"chars": 190,
"preview": "{\"n_episodes\": 100, \"title\": \"randjump(5, (10, 30), 1, [])\", \"window_episode\": 180, \"forecast_horizon_range\": [10, 30], "
},
{
"path": "data/PairSamplerDB/randjump_100,1(10, 30)[]_B/param.json",
"chars": 190,
"preview": "{\"n_episodes\": 100, \"title\": \"randjump(5, (10, 30), 1, [])\", \"window_episode\": 180, \"forecast_horizon_range\": [10, 30], "
},
{
"path": "data/SinSamplerDB/concat_half_base_A/param.json",
"chars": 206,
"preview": "{\"n_episodes\": 100, \"title\": \"ConcatHalfSin+Base(0.5, (10, 40), (5, 80))\", \"window_episode\": 180, \"noise_amplitude_ratio"
},
{
"path": "data/SinSamplerDB/concat_half_base_B/param.json",
"chars": 206,
"preview": "{\"n_episodes\": 100, \"title\": \"ConcatHalfSin+Base(0.5, (10, 40), (5, 80))\", \"window_episode\": 180, \"noise_amplitude_ratio"
},
{
"path": "env.yml",
"chars": 1043,
"preview": "name: drlts\nchannels:\n - defaults\ndependencies:\n - ca-certificates=2018.03.07=0\n - certifi=2018.4.16=py36_0\n - h5py="
},
{
"path": "src/agents.py",
"chars": 8619,
"preview": "from lib import *\n\nclass Agent:\n\n\tdef __init__(self, model, \n\t\tbatch_size=32, discount_factor=0.95):\n\n\t\tself.model = mod"
},
{
"path": "src/emulator.py",
"chars": 2530,
"preview": "from lib import *\n\n# by Xiang Gao, 2018\n\n\n\ndef find_ideal(p, just_once):\n\tif not just_once:\n\t\tdiff = np.array(p[1:]) - n"
},
{
"path": "src/lib.py",
"chars": 320,
"preview": "import random, os, datetime, pickle, json, keras, sys\nimport pandas as pd\n#import matplotlib.pyplot as plt\nimport numpy "
},
{
"path": "src/main.py",
"chars": 3441,
"preview": "#!/usr/bin/env python2\n\nfrom lib import *\nfrom sampler import *\nfrom agents import *\nfrom emulator import *\nfrom simulat"
},
{
"path": "src/sampler.py",
"chars": 8299,
"preview": "from lib import *\n\ndef read_data(date, instrument, time_step):\n\tpath = os.path.join(PRICE_FLD, date, instrument+'.csv')\n"
},
{
"path": "src/simulators.py",
"chars": 4725,
"preview": "from lib import *\n\n\n\nclass Simulator:\n\n\tdef play_one_episode(self, exploration, training=True, rand_price=True, print_t="
},
{
"path": "src/visualizer.py",
"chars": 5765,
"preview": "from lib import *\n\n\n\ndef get_tick_labels(bins, ticks):\n\n\tticklabels = []\n\tfor i in ticks:\n\t\tif i < len(bins):\n\t\t\tticklab"
}
]
// ... and 4 more files (download for full content)
About this extraction
This page contains the full source code of the golsun/deep-RL-trading GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 19 files (36.7 KB), approximately 11.6k tokens, and a symbol index with 87 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.