Full Code of tz28/deep-learning for AI

master 9baa081a487a cached

14 files

142.5 KB

48.2k tokens

158 symbols

1 requests

Download .txt

Repository: tz28/deep-learning
Branch: master
Commit: 9baa081a487a
Files: 14
Total size: 142.5 KB

Directory structure:
gitextract_8bva7oa8/

├── README.md
├── batch_normalization.py
├── compare_initializations.py
├── deep_neural_network_ng.py
├── deep_neural_network_release.py
├── deep_neural_network_v1.py
├── deep_neural_network_v2.py
├── deep_neural_network_with_L2.py
├── deep_neural_network_with_dropout.py
├── deep_neural_network_with_gd.py
├── deep_neural_network_with_optimizers.py
├── dinos.txt
├── gradient_checking.py
└── rnn.py

================================================
FILE CONTENTS
================================================

================================================
FILE: README.md
================================================
# deep-learning
personal practice
---------------
深度学习个人练习，该项目实现了深度学习中一些常用的算法，内容包括：

+ 四种初始化方法：zero initialize, random initialize, xavier initialize, he initialize。

+ 深度神经网络

+ 正则化

+ dropout

+ 三种梯度下降方法：BGD, SGD, mini-batch

+ 六种优化算法：momentum、nesterov momentum、Adagrad、Adadelta、RMSprop、Adam

+ 梯度检验

+ batch normalization

+ recurrent neural network (RNN)
------

![#f03c15](https://placehold.it/15/f03c15/000000?text=+) ***Note: 下列 1-10中网络架构主要为四大块： initialize parameters、forward propagation、backward propagation、 update parameters，其中在 fp 和 bp 的时候各个功能没有单独封装，这样会导致耦合度过高，结构不清晰。
11中优化了网络结构，使得耦合度更低，网络结构推荐用11中的结构。<br>
今天（2018-9-23）重构了神经网络架构（见 deep_neural_network_release.py），把各功能函数分离出来，耦合度更低，结构更清楚，bp过程更加清晰。推荐此版本，用1-10时，可用此版本替换相应代码***

1、**deep_neural_network_v1.py**：自己实现的最简单的深度神经网络（多层感知机),不包含正则化,dropout,动量等...总之是最基本的,只有fp和bp。

2、**deep_neural_network_v2.py**:  自己实现的最简单的深度神经网络（多层感知机）,和v1的唯一区别在于：v1中fp过程,caches每一层存储的是（w,b,z,A_pre）,
而v2每一层存储的是（w,b,z,A）, 第0层存储的（None,None,None,X）,X即A0。    `个人更推荐用v2版本`.

关于具体的推导实现讲解，请移步本人的CSDN博客：https://blog.csdn.net/u012328159/article/details/79485767

3、**deep_neural_network_ng.py**: ---改正版ng在Coursera上的深度神经网络<br>
**具体主要改正的是对relu激活函数的求导，具体内容为:<br>**
```python
def relu_backward(dA, cache):
	"""
	Implement the backward propagation for a single RELU unit.
	
	Arguments:
	dA -- post-activation gradient, of any shape
	cache -- 'Z' where we store for computing backward propagation efficiently
	Returns:
	dZ -- Gradient of the cost with respect to Z
	"""
	Z = cache
	dZ = dA * np.int64(Z > 0)
	return dZ
```
**ng在作业中写的relu导数（个人认为是错的）为：<br>**
```python
def relu_backward(dA, cache):
    """
    Implement the backward propagation for a single RELU unit.
    Arguments:
    
    dA -- post-activation gradient, of any shape
    cache -- 'Z' where we store for computing backward propagation efficiently
    Returns:
    dZ -- Gradient of the cost with respect to Z
    """
    Z = cache
    dZ = np.array(dA, copy=True) # just converting dz to a correct object.
    
    # When z <= 0, you should set dz to 0 as well. 
    dZ[Z <= 0] = 0
    
    assert (dZ.shape == Z.shape)
    
    return dZ
```

4、**compare_initializations.py**： 比较了四种初始化方法（初始化为0，随机初始化，Xavier initialization和He initialization），具体效果见CSDN博客：https://blog.csdn.net/u012328159/article/details/80025785

5、 **deep_neural_network_with_L2.py**: 带L2正则项正则项的网络（在deep_neural_network.py的基础上增加了L2正则项）

6、 **deep_neural_network_with_dropout.py** ：带dropout正则项的网络（在deep_neural_network.py的基础上增加了dropout正则项），具体详见CSDN博客：https://blog.csdn.net/u012328159/article/details/80210363

7、 **gradient_checking.py** ： use gradient checking in dnn，梯度检验，可以检查自己手撸的bp是否正确。具体原理，详见我的CSDN博客：https://blog.csdn.net/u012328159/article/details/80232585

8、 **deep_neural_network_with_gd.py** ：实现了三种梯度下降，包括：batch gradient descent（BGD）、stochastic gradient descent（SGD）和 mini-batch gradient descent。具体内容见我的CSDN博客：https://blog.csdn.net/u012328159/article/details/80252012

9、 **deep_neural_network_with_optimizers.py** ：实现了深度学习中几种优化器，包括：momentum、nesterov momentum、Adagrad、Adadelta、RMSprop、Adam。关于这几种算法，具体内容，见本人的CSDN博客：https://blog.csdn.net/u012328159/article/details/80311892

10、 **机器学习资料整理.pdf** ：整理了一些我知道的机器学习资料，希望能够帮助到想学习的同学。博客同步地址：https://blog.csdn.net/u012328159/article/details/80574713

11、 **batch_normalization.py** ：实现了batch normalization, 改进了整个网络的架构，使得网络的架构更加清晰，耦合度更低。关于batch normalization的具体内容，见本人的CSDN博客：https://blog.csdn.net/u012328159/article/details/82840084

12、 **deep_neural_network_release.py**：重构了深度神经网络，把各功能函数分离出来，耦合度更低，结构更清楚，bp过程更加清晰。**推荐此版本**

13、**rnn.py**：recurrent neural network，最简单的循环神经网络（确切来说是基于字符的），输入输出采用one-hot，场景：生成单词。包括，梯度裁剪、字符采样。关于bp推导，详见本人CSDN博客：https://blog.csdn.net/u012328159/article/details/84962285

<br>
<br>
--------
动态更新.................


================================================
FILE: batch_normalization.py
================================================
# implement the batch normalization
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import  load_breast_cancer
from sklearn.model_selection import train_test_split


#initialize parameters(w,b)
def initialize_parameters(layer_dims):
	"""
	:param layer_dims: list,每一层单元的个数（维度）
			gamma -- scale vector of shape (size of current layer ,1)
            beta -- offset vector of shape (size of current layer ,1)
	:return: parameter: directory store w1,w2,...,wL,b1,...,bL
			 bn_param: directory store moving_mean, moving_var
	"""
	np.random.seed(3)
	L = len(layer_dims)#the number of layers in the network
	parameters = {}
	# initialize the exponential weight average
	bn_param = {}
	for l in range(1,L):
		parameters["W" + str(l)] = np.random.randn(layer_dims[l],layer_dims[l-1])*0.01
		parameters["b" + str(l)] = np.zeros((layer_dims[l],1))
		parameters["gamma" + str(l)] = np.ones((layer_dims[l],1))
		parameters["beta" + str(l)] = np.zeros((layer_dims[l],1))
		bn_param["moving_mean" + str(l)] = np.zeros((layer_dims[l], 1))
		bn_param["moving_var" + str(l)] = np.zeros((layer_dims[l], 1))

	return parameters, bn_param

def relu_forward(Z):
	"""
	:param Z: Output of the linear layer
	:return:
	A: output of activation
	"""
	A = np.maximum(0,Z)
	return A

#implement the activation function(ReLU and sigmoid)
def sigmoid_forward(Z):
	"""
	:param Z: Output of the linear layer
	:return:
	"""
	A = 1 / (1 + np.exp(-Z))
	return A

def linear_forward(X, W, b):
	z = np.dot(W, X) + b
	return z

def batchnorm_forward(z, gamma, beta, epsilon = 1e-12):
	"""
	:param z: the input of activation (z = np.dot(W,A_pre) + b)
	:param epsilon: is a constant for denominator is 0
	:return: z_out, mean, variance
	"""
	mu = np.mean(z, axis=1, keepdims=True)#axis=1按行求均值
	var = np.var(z, axis=1, keepdims=True)
	sqrt_var = np.sqrt(var + epsilon)
	z_norm = (z - mu) / sqrt_var
	z_out = np.multiply(gamma,z_norm) + beta #对应元素点乘
	return z_out, mu, var, z_norm, sqrt_var


def forward_propagation(X, parameters, bn_param, decay = 0.9):
	"""
	X -- input dataset, of shape (input size, number of examples)
    parameters -- python dictionary containing your parameters "W1", "b1", "gamma1","beta1",W2", "b2","gamma2","beta2",...,"WL", "bL","gammaL","betaL"
                    W -- weight matrix of shape (size of current layer, size of previous layer)
                    b -- bias vector of shape (size of current layer,1)
                    gamma -- scale vector of shape (size of current layer ,1)
                    beta -- offset vector of shape (size of current layer ,1)
                    decay -- the parameter of exponential weight average
                    moving_mean = decay * moving_mean + (1 - decay) * current_mean
                    moving_var = decay * moving_var + (1 - decay) * moving_var
                    the moving_mean and moving_var are used for test
    :return:
	AL: the output of the last Layer(y_predict)
	caches: list, every element is a tuple:(A, W,b,gamma,sqrt_var,z_out,Z_norm)
	"""
	L = len(parameters) // 4  # number of layer
	A = X
	caches = []
	# calculate from 1 to L-1 layer
	for l in range(1,L):
		W = parameters["W" + str(l)]
		b = parameters["b" + str(l)]
		gamma = parameters["gamma" + str(l)]
		beta = parameters["beta" + str(l)]
		z = linear_forward(A, W, b)
		z_out, mu, var, z_norm, sqrt_var = batchnorm_forward(z, gamma, beta) #batch normalization
		caches.append((A, W, b, gamma, sqrt_var, z_out, z_norm)) #以激活单元为分界线，把做激活前的变量放在一起，激活后可以认为是下一层的x了
		A = relu_forward(z_out) #relu activation function
		#exponential weight average for test
		bn_param["moving_mean" + str(l)] = decay * bn_param["moving_mean" + str(l)] + (1 - decay) * mu
		bn_param["moving_var" + str(l)] = decay * bn_param["moving_var" + str(l)] + (1 - decay) * var
	# calculate Lth layer(last layer)
	WL = parameters["W" + str(L)]
	bL = parameters["b" + str(L)]
	zL = linear_forward(A, WL, bL)
	caches.append((A, WL, bL, None, None, None, None))
	AL = sigmoid_forward(zL)
	return AL, caches, bn_param

#calculate cost function
def compute_cost(AL,Y):
	"""
	:param AL: 最后一层的激活值，即预测值，shape:(1,number of examples)
	:param Y:真实值,shape:(1, number of examples)
	:return:
	"""
	m = Y.shape[1]
	# cost = -1.0/m * np.sum(Y*np.log(AL)+(1-Y)*np.log(1.0 - AL))#py中*是点乘
	# cost = (1. / m) * (-np.dot(Y, np.log(AL).T) - np.dot(1 - Y, np.log(1 - AL).T)) #推荐用这个，上面那个容易出错
	cost = 1. / m * np.nansum(np.multiply(-np.log(AL), Y) +
	                          np.multiply(-np.log(1 - AL), 1 - Y))
	#从数组的形状中删除单维条目，即把shape中为1的维度去掉，比如把[[[2]]]变成2
	cost = np.squeeze(cost)
	# print('=====================cost===================')
	# print(cost)
	return cost

#derivation of relu
def relu_backward(dA, Z):
	"""
	:param Z: the input of activation function
	:return:
	"""
	dout = np.multiply(dA, np.int64(Z > 0))
	return dout

def batchnorm_backward(dout, cache):
	"""
	:param dout: Upstream derivatives
	:param cache:
	:return:
	"""
	_, _, _, gamma, sqrt_var, _, Z_norm = cache
	m = dout.shape[1]
	dgamma = np.sum(dout*Z_norm, axis=1, keepdims=True) #*作用于矩阵时为点乘
	dbeta = np.sum(dout, axis=1, keepdims=True)
	dy = 1./m * gamma * sqrt_var * (m * dout - np.sum(dout, axis=1, keepdims=True) - Z_norm*np.sum(dout*Z_norm, axis=1, keepdims=True))
	return dgamma, dbeta, dy

def linear_backward(dZ, cache):
	"""
	:param dZ: Upstream derivative, the shape (n^[l+1],m)
	:param A: input of this layer
	:return:
	"""
	A, W, _, _, _, _, _ = cache
	dW = np.dot(dZ, A.T)
	db = np.sum(dZ, axis=1, keepdims=True)
	da = np.dot(W.T, dZ)
	return da, dW, db

def backward_propagation(AL, Y, caches):
	"""
	Implement the backward propagation presented in figure 2.
	Arguments:
	Y -- true "label" vector (containing 0 if cat, 1 if non-cat)
	caches -- caches output from forward_propagation(),(w,b,gamma,sqrt_var,z_out,Z_norm,A)

	Returns:
	gradients -- A dictionary with the gradients with respect to dW,db
	"""
	m = Y.shape[1]
	L = len(caches)-1
	# print("L:   " + str(L))
	#calculate the Lth layer gradients
	dz = 1./m * (AL - Y)
	da, dWL, dbL = linear_backward(dz, caches[L])
	gradients = {"dW"+str(L+1): dWL, "db"+str(L+1): dbL}
	#calculate from L-1 to 1 layer gradients
	for l in reversed(range(0,L)): # L-1,L-3,....,1
		#relu_backward->batchnorm_backward->linear backward
		A, w, b, gamma, sqrt_var, z_out, z_norm = caches[l]
		#relu backward
		dout = relu_backward(da,z_out)
		#batch normalization
		dgamma, dbeta, dz = batchnorm_backward(dout,caches[l])
		# print("===============dz" + str(l+1) + "===================")
		# print(dz.shape)
		#linear backward
		da, dW, db = linear_backward(dz,caches[l])
		# print("===============dw"+ str(l+1) +"=============")
		# print(dW.shape)
		#gradient
		gradients["dW" + str(l+1)] = dW
		gradients["db" + str(l+1)] = db
		gradients["dgamma" + str(l+1)] = dgamma
		gradients["dbeta" + str(l+1)] = dbeta
	return gradients

def update_parameters(parameters, grads, learning_rate):
	"""
	:param parameters: dictionary, W, b
	:param grads: dW,db,dgamma,dbeta
	:param learning_rate: alpha
	:return:
	"""
	L = len(parameters) // 4
	for l in range(L):
		parameters["W" + str(l + 1)] = parameters["W" + str(l + 1)] - learning_rate * grads["dW" + str(l+1)]
		parameters["b" + str(l + 1)] = parameters["b" + str(l + 1)] - learning_rate * grads["db" + str(l+1)]
		if l < L-1:
			parameters["gamma" + str(l + 1)] = parameters["gamma" + str(l + 1)] - learning_rate * grads["dgamma" + str(l + 1)]
			parameters["beta" + str(l + 1)] = parameters["beta" + str(l + 1)] - learning_rate * grads["dbeta" + str(l + 1)]
	return parameters

def random_mini_batches(X, Y, mini_batch_size = 64, seed=1):
	"""
	Creates a list of random minibatches from (X, Y)
	Arguments:
	X -- input data, of shape (input size, number of examples)
	Y -- true "label" vector (1 for blue dot / 0 for red dot), of shape (1, number of examples)
	mini_batch_size -- size of the mini-batches, integer

	Returns:
	mini_batches -- list of synchronous (mini_batch_X, mini_batch_Y)
	"""
	np.random.seed(seed)
	m = X.shape[1]  # number of training examples
	mini_batches = []

	# Step 1: Shuffle (X, Y)
	permutation = list(np.random.permutation(m))
	shuffled_X = X[:, permutation]
	shuffled_Y = Y[:, permutation].reshape((1, m))

	# Step 2: Partition (shuffled_X, shuffled_Y). Minus the end case.
	num_complete_minibatches = m // mini_batch_size  # number of mini batches of size mini_batch_size in your partitionning
	for k in range(0, num_complete_minibatches):
		mini_batch_X = shuffled_X[:, k * mini_batch_size: (k + 1) * mini_batch_size]
		mini_batch_Y = shuffled_Y[:, k * mini_batch_size: (k + 1) * mini_batch_size]
		mini_batch = (mini_batch_X, mini_batch_Y)
		mini_batches.append(mini_batch)

	# Handling the end case (last mini-batch < mini_batch_size)
	if m % mini_batch_size != 0:
		mini_batch_X = shuffled_X[:, num_complete_minibatches * mini_batch_size: m]
		mini_batch_Y = shuffled_Y[:, num_complete_minibatches * mini_batch_size: m]
		mini_batch = (mini_batch_X, mini_batch_Y)
		mini_batches.append(mini_batch)

	return mini_batches


def L_layer_model(X, Y, layer_dims, learning_rate, num_iterations, mini_batch_size = 64):
	"""
	:param X:
	:param Y:
	:param layer_dims: list containing the input size and each layer size
	:param learning_rate:
	:param num_iterations:
	:return:
	parameters：final parameters:(W,b,gamma,beta)
	bn_param: moving_mean, moving_var
	"""
	costs = []
	# initialize parameters
	parameters, bn_param = initialize_parameters(layer_dims)
	seed = 0
	for i in range(0, num_iterations):
		seed = seed + 1
		minibatches = random_mini_batches(X, Y, mini_batch_size, seed)
		for minibatch in minibatches:
			# Select a minibatch
			(minibatch_X, minibatch_Y) = minibatch
			#foward propagation
			AL,caches,bn_param = forward_propagation(minibatch_X, parameters,bn_param)
			# calculate the cost
			cost = compute_cost(AL, minibatch_Y)
			#backward propagation
			grads = backward_propagation(AL, minibatch_Y, caches)
			#update parameters
			parameters = update_parameters(parameters, grads, learning_rate)
		if i % 200 == 0:
			print("Cost after iteration {}: {}".format(i, cost))
			costs.append(cost)
	print('length of cost')
	print(len(costs))
	plt.clf()
	plt.plot(costs)  # o-:圆形
	plt.xlabel("iterations(thousand)")  # 横坐标名字
	plt.ylabel("cost")  # 纵坐标名字
	plt.show()
	return parameters,bn_param

#fp for test
def forward_propagation_for_test(X, parameters, bn_param, epsilon = 1e-12):
	"""
	X -- input dataset, of shape (input size, number of examples)
    parameters -- python dictionary containing your parameters "W1", "b1", "gamma1","beta1",W2", "b2","gamma2","beta2",...,"WL", "bL","gammaL","betaL"
                    W -- weight matrix of shape (size of current layer, size of previous layer)
                    b -- bias vector of shape (size of current layer,1)
                    gamma -- scale vector of shape (size of current layer ,1)
                    beta -- offset vector of shape (size of current layer ,1)
                    decay -- the parameter of exponential weight average
                    moving_mean = decay * moving_mean + (1 - decay) * current_mean
                    moving_var = decay * moving_var + (1 - decay) * moving_var
                    the moving_mean and moving_var are used for test
    :return:
	AL: the output of the last Layer(y_predict)
	caches: list, every element is a tuple:(A, W,b,gamma,sqrt_var,z,Z_norm)
	"""
	L = len(parameters) // 4  # number of layer
	A = X
	# calculate from 1 to L-1 layer
	for l in range(1,L):
		W = parameters["W" + str(l)]
		b = parameters["b" + str(l)]
		gamma = parameters["gamma" + str(l)]
		beta = parameters["beta" + str(l)]
		z = linear_forward(A, W, b)
		#batch normalization
		# exponential weight average
		moving_mean = bn_param["moving_mean" + str(l)]
		moving_var = bn_param["moving_var" + str(l)]
		sqrt_var = np.sqrt(moving_var + epsilon)
		z_norm = (z - moving_mean) / sqrt_var
		z_out = np.multiply(gamma, z_norm) + beta  # 对应元素点乘
		#relu forward
		A = relu_forward(z_out) #relu activation function

	# calculate Lth layer(last layer)
	WL = parameters["W" + str(L)]
	bL = parameters["b" + str(L)]
	zL = linear_forward(A, WL, bL)
	AL = sigmoid_forward(zL)
	return AL



#predict function
def predict(X_test, y_test, parameters, bn_param):
	"""
	:param X:
	:param y:
	:param parameters:
	:return:
	"""
	m = y_test.shape[1]
	Y_prediction = np.zeros((1, m))
	prob = forward_propagation_for_test(X_test, parameters, bn_param)
	for i in range(prob.shape[1]):
		# Convert probabilities A[0,i] to actual predictions p[0,i]
		if prob[0, i] > 0.5:
			Y_prediction[0, i] = 1
		else:
			Y_prediction[0, i] = 0
	accuracy = 1- np.mean(np.abs(Y_prediction - y_test))
	return accuracy

#DNN model
def DNN(X_train, y_train, X_test, y_test, layer_dims, learning_rate= 0.01, num_iterations=10000, mini_batch_size=64):
	parameters, bn_param = L_layer_model(X_train, y_train, layer_dims, learning_rate, num_iterations, mini_batch_size)
	train_accuracy = predict(X_train, y_train, parameters, bn_param)
	test_accuracy = predict(X_test,y_test,parameters,bn_param)
	return train_accuracy, test_accuracy


if __name__ == "__main__":
	X_data, y_data = load_breast_cancer(return_X_y=True)
	X_train, X_test,y_train,y_test = train_test_split(X_data, y_data, train_size=0.8,test_size=0.2,random_state=28)
	X_train = X_train.T
	y_train = y_train.reshape(y_train.shape[0], -1).T
	X_test = X_test.T
	y_test = y_test.reshape(y_test.shape[0], -1).T
	train_accuracy, test_accuracy = DNN(X_train,y_train,X_test,y_test,[X_train.shape[0],10,5,1], mini_batch_size = 256)
	print('train accuracy: ', train_accuracy)
	print('test accuracy: ', test_accuracy)

================================================
FILE: compare_initializations.py
================================================

#对比几种初始化方法
import numpy as np
import matplotlib.pyplot as plt

#初始化为0
def initialize_parameters_zeros(layers_dims):
	"""
	Arguments:
	layer_dims -- python array (list) containing the size of each layer.
	Returns:
	parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
					W1 -- weight matrix of shape (layers_dims[1], layers_dims[0])
					b1 -- bias vector of shape (layers_dims[1], 1)
					...
					WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])
					bL -- bias vector of shape (layers_dims[L], 1)
	"""
	parameters = {}
	L = len(layers_dims)  # number of layers in the network

	for l in range(1, L):
		parameters['W' + str(l)] = np.zeros((layers_dims[l], layers_dims[l - 1]))
		parameters['b' + str(l)] = np.zeros((layers_dims[l], 1))
	return parameters

#随机初始化
def initialize_parameters_random(layers_dims):
	"""
	Arguments:
	layer_dims -- python array (list) containing the size of each layer.

	Returns:
	parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
					W1 -- weight matrix of shape (layers_dims[1], layers_dims[0])
					b1 -- bias vector of shape (layers_dims[1], 1)
					...
					WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])
					bL -- bias vector of shape (layers_dims[L], 1)
	"""
	np.random.seed(3)  # This seed makes sure your "random" numbers will be the as ours
	parameters = {}
	L = len(layers_dims)  # integer representing the number of layers
	for l in range(1, L):
		parameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l - 1])*0.01
		parameters['b' + str(l)] = np.zeros((layers_dims[l], 1))
	return parameters

#xavier initialization
def initialize_parameters_xavier(layers_dims):
	"""
	Arguments:
	layer_dims -- python array (list) containing the size of each layer.

	Returns:
	parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
					W1 -- weight matrix of shape (layers_dims[1], layers_dims[0])
					b1 -- bias vector of shape (layers_dims[1], 1)
					...
					WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])
					bL -- bias vector of shape (layers_dims[L], 1)
	"""
	np.random.seed(3)
	parameters = {}
	L = len(layers_dims)  # integer representing the number of layers
	for l in range(1, L):
		parameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l - 1]) * np.sqrt(1 / layers_dims[l - 1])
		parameters['b' + str(l)] = np.zeros((layers_dims[l], 1))
	return parameters

#He initialization
def initialize_parameters_he(layers_dims):
	"""
	Arguments:
	layer_dims -- python array (list) containing the size of each layer.

	Returns:
	parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
					W1 -- weight matrix of shape (layers_dims[1], layers_dims[0])
					b1 -- bias vector of shape (layers_dims[1], 1)
					...
					WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])
					bL -- bias vector of shape (layers_dims[L], 1)
	"""
	np.random.seed(3)
	parameters = {}
	L = len(layers_dims)  # integer representing the number of layers

	for l in range(1, L):
		parameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l - 1]) * np.sqrt(2 / layers_dims[l - 1])
		parameters['b' + str(l)] = np.zeros((layers_dims[l], 1))
	return parameters

def relu(Z):
	"""
	:param Z: Output of the linear layer
	:return:
	A: output of activation
	"""
	A = np.maximum(0,Z)
	return A


def initialize_parameters(layer_dims):
	"""
	:param layer_dims: list,每一层单元的个数（维度）
	:return:dictionary,存储参数w1,w2,...,wL,b1,...,bL
	"""
	np.random.seed(3)
	L = len(layer_dims)#the number of layers in the network
	parameters = {}
	for l in range(1,L):
		parameters["W" + str(l)] = np.random.randn(layer_dims[l],layer_dims[l-1])*np.sqrt(2 / layer_dims[l - 1])
		parameters["b" + str(l)] = np.zeros((layer_dims[l],1))
	return parameters

def forward_propagation(initialization="he"):
	data = np.random.randn(1000, 100000)
	layers_dims = [1000,800,500,300,200,100,10]
	num_layers = len(layers_dims)
	# Initialize parameters dictionary.
	if initialization == "zeros":
		parameters = initialize_parameters_zeros(layers_dims)
	elif initialization == "random":
		parameters = initialize_parameters_random(layers_dims)
	elif initialization == "xavier":
		parameters = initialize_parameters_xavier(layers_dims)
	elif initialization == "he":
		parameters = initialize_parameters_he(layers_dims)
	A = data
	for l in range(1,num_layers):
		A_pre = A
		W = parameters["W" + str(l)]
		b = parameters["b" + str(l)]
		z = np.dot(W,A_pre) + b #计算z = wx + b
		# A = np.tanh(z) #relu activation function
		A = relu(z)
		plt.subplot(2,3,l)
		plt.hist(A.flatten(),facecolor='g')
		plt.xlim([-1,1])
		plt.yticks([])
	plt.show()

if __name__ == '__main__':
	forward_propagation()

================================================
FILE: deep_neural_network_ng.py
================================================
import numpy as np
from machine_learning.deep_neural_network.init_utils import load_dataset
import matplotlib.pyplot as plt

#initialize parameters(w,b)
def initialize_parameters(layer_dims):
	"""
	:param layer_dims: list,每一层单元的个数（维度）
	:return:dictionary,存储参数w1,w2,...,wL,b1,...,bL
	"""
	L = len(layer_dims)#the number of layers in the network
	parameters = {}
	np.random.seed(3)
	for l in range(1,L):
		parameters["W" + str(l)] = np.random.randn(layer_dims[l],layer_dims[l-1])
		parameters["b" + str(l)] = np.zeros((layer_dims[l],1))
	return parameters
#Implement the linear part of a layer's forward propagation: z = w[l] * a[l-1] + b[l]
def linear_forward(A_pre,W,b):
	"""
	:param A_pre:上一层的激活值,shape:(size of previous layer,m)
	:param W: weight matrix,shape:(size of current layer,size of previous layer)
	:param b: bias vector,shape:(size of current layer,1)
	:return:
	Z：激活函数的输入值（线性相加和）
	cache：因为bp的时候要用到w，b，a所以要把每一层的都存起来，方便后面用
	"""
	Z = np.dot(W,A_pre) + b
	cache = (A_pre,W,b)
	return Z,cache
#implement the activation function(ReLU and sigmoid)
def relu(Z):
	"""
	:param Z: Output of the linear layer
	:return:
	A: output of activation
	activation_cache: 要把Z保存起来，因为后面bp，对relu求导，求dz的时候要用到
	"""
	A = np.maximum(0,Z)
	activation_cache = Z #要把Z保存起来，因为后面bp，对relu求导，求dz的时候要用到
	return A, activation_cache
#implement the activation function(ReLU and sigmoid)
def sigmoid(Z):
	"""
	:param Z: Output of the linear layer
	:return:
	"""
	A = 1 / (1 + np.exp(-Z))
	activation_cache = Z
	return A,activation_cache
#calculate the output of the activation
def linear_activation_forward(A_pre,W,b,activation):
	"""
	:param A_pre: activations from previous layer,shape(size of previous layer, number of examples)
	:param W:weights matrix,shape(size of current layer, size of previous layer)
	:param b:bias vector, shape(size of the current layer, 1)
	:param activation:the activation to be used in this layer(ReLu or sigmoid)
	:return:
	A: the output of the activation function
	cache: tuple,形式为:((A_pre,W,b),Z),后面bp要用到的((A_pre,W,b),Z)
	"""
	if activation == "sigmoid":
		Z, linear_cache = linear_forward(A_pre,W,b)#linear_cache:(A_pre,W,b)
		A, activation_cache = sigmoid(Z)# activation_cache: Z
	elif activation == "relu":
		Z, linear_cache = linear_forward(A_pre, W, b)#linear_cache:(A_pre,W,b)
		A, activation_cache = relu(Z)# activation_cache: Z
	cache = (linear_cache, activation_cache)
	return A, cache
# Implement the forward propagation of the L-layer model
def L_model_forward(X,parameters):
	"""
	:param X: data set,input matrix,shape(feature dimensions,number of example)
	:param parameters: W,b
	:return:
	AL: activation of Lth layer i.e. y_hat(y_predict)
	caches: list,存储每一层的linear_cache(A_pre,W,b),activation_cache(Z)
	"""
	caches = []#用于存储每一层的，A_pre,W,b,Z
	A = X
	L = len(parameters) // 2 # number of layer
	#calculate from 1 to L-1 layer activation
	for l in range(1,L):
		A_pre = A
		W = parameters["W" + str(l)]
		b = parameters["b" + str(l)]
		A, cache = linear_activation_forward(A_pre, W, b, "relu")
		caches.append(cache)
	#calculate Lth layer activation
	AL, cache = linear_activation_forward(A,parameters["W" + str(L)],parameters["b" + str(L)],"sigmoid")
	caches.append(cache)
	# print("W1: " + str(caches[0][0][1].shape))
	# print(caches[0][0][1])
	# print("b1: " + str(caches[0][0][2].shape))
	# print(caches[0][0][2])
	# print("Z1: " + str(caches[0][1].shape))
	# print(caches[0][1])
	# print("A1: " + str(caches[1][0][0].shape))
	# print(caches[1][0][0])
	# print('==========================')
	# print("W2: " + str(caches[1][0][1].shape))
	# print(caches[1][0][1])
	# print("b2: " + str(caches[1][0][2].shape))
	# print(caches[1][0][2])
	# print("Z2: " + str(caches[1][1].shape))
	# print(caches[1][1])
	# print("A2: " + str(caches[2][0][0].shape))
	# print(caches[2][0][0])
	# print('==========================')
	# print("W3: " + str(caches[2][0][1].shape))
	# print(caches[2][0][1])
	# print("b3: " + str(caches[2][0][2].shape))
	# print(caches[2][0][2])
	# print("Z3: " + str(caches[2][1].shape))
	# print(caches[2][1])
	# print("A3: " + str(AL.shape))
	# print(AL)
	return AL,caches
#calculate cost function
def compute_cost(AL,Y):
	"""
	:param AL: 最后一层的激活值，即预测值，shape:(1,number of examples)
	:param Y:真实值,shape:(1, number of examples)
	:return:
	"""
	m = Y.shape[1]
	cost = 1. / m * np.nansum(np.multiply(-np.log(AL), Y) + np.multiply(-np.log(1 - AL), 1 - Y))
	#从数组的形状中删除单维条目，即把shape中为1的维度去掉，比如把[[[2]]]变成2
	cost = np.squeeze(cost)
	return cost

def sigmoid_backward(dA, Z):
	"""
	:param dA:
	:param Z:
	:return:
	"""
	a = 1/(1 + np.exp(-Z))
	dZ = dA * a*(1-a)
	return dZ


def relu_backward(dA, cache):
	"""
	Implement the backward propagation for a single RELU unit.

	Arguments:
	dA -- post-activation gradient, of any shape
	cache -- 'Z' where we store for computing backward propagation efficiently

	Returns:
	dZ -- Gradient of the cost with respect to Z
	"""
	Z = cache
	dZ = dA * np.int64(Z > 0) #
	return dZ
	
#calculate dA_pre,dW,db
def linear_backward(dZ, cache):
	"""
	:param dZ:
	:param cache: 前面fp保存的linear_cache(A_pre,W,b)
	:return:
	"""
	A_prev, W, b = cache
	m = A_prev.shape[1]
	dW = 1/m * np.dot(dZ,A_prev.T)#有时候不敢确定是线代乘还是点乘，有个小trick就是dW维度一定和W保持一致，这样就好确定是np.dot()还是*了
	db = 1/m * np.sum(dZ,axis=1,keepdims=True)
	dA_pre = np.dot(W.T,dZ)
	return dA_pre,dW,db

def linear_activation_backward(dA, cache, activation):
	"""
	:param dA:
	:param cache:
	:param activation:
	:return:
	"""
	linear_cache, activation_cache = cache #((A_pre,W,b),Z)
	if activation == "relu":
		dZ = relu_backward(dA, activation_cache)
		dA_pre, dW, db = linear_backward(dZ,linear_cache)
	elif activation == "sigmoid":
		dZ = sigmoid_backward(dA, activation_cache)
		dA_pre, dW, db = linear_backward(dZ, linear_cache)
	return dA_pre, dW, db

# Implement the backward propagation of the L-layer model
def L_model_backward(AL, Y, caches):
	"""
	:param AL: 最后一层激活值（i.e y_hat）
	:param Y: 实际类别(0,1)
	:param caches: fp时各层的((A_pre,W,b),Z)
	:return:
	"""
	grads = {}#存放各层的dW，db
	L = len(caches)
	# print('L:  ' + str(L))
	# 这个地方之所以没有1/m，是因为对Z,A等中间变量求导时，直接使用的是交叉熵函数对Z,A求导，
	# 而不是cost function，只有对W，b求导时使用cost function
	#第L层单独算,因为激活函数是sigmoid
	dAL = -(np.divide(Y,AL) - np.divide((1-Y),(1-AL)))
	# print("dAL: ")
	# print(dAL)
	current_cache = caches[L - 1]
	grads["dA" + str(L - 1)], grads["dW" + str(L)], grads["db" + str(L)] \
		= linear_activation_backward(dAL,current_cache,"sigmoid")
	for l in reversed(range(L-1)):
		current_cache = caches[l]
		dA_pre_temp, dW_temp, db_temp \
			= linear_activation_backward(grads["dA" + str(l + 1)],current_cache,"relu")
		grads["dA" + str(l)] = dA_pre_temp
		grads["dW" + str(l+1)] = dW_temp
		grads["db" + str(l+1)] = db_temp
	# print("******************************梯度*************************")
	# print(grads)
	return grads
# update w,b
def update_parameters(parameters, grads, learning_rate):
	"""
	:param parameters: dictionary,  W,b
	:param grads: dW,db
	:param learning_rate: alpha
	:return:
	"""
	L = len(parameters) // 2
	for l in range(L):
		parameters["W" + str(l + 1)] = parameters["W" + str(l + 1)] - learning_rate * grads["dW" + str(l+1)]
		parameters["b" + str(l + 1)] = parameters["b" + str(l + 1)] - learning_rate * grads["db" + str(l+1)]
	# print(parameters)
	return parameters

def L_layer_model(X, Y, layer_dims, learning_rate, num_iterations):
	"""
	:param X:
	:param Y:
	:param layer_dims:list containing the input size and each layer size
	:param learning_rate:
	:param num_iterations:
	:return:
	parameters：final parameters:(W,b)
	"""
	costs = []
	# initialize parameters
	parameters = initialize_parameters(layer_dims)
	for i in range(0, num_iterations):
		#foward propagation
		AL,caches = L_model_forward(X, parameters)
		# calculate the cost
		cost = compute_cost(AL, Y)
		if i % 1000 == 0:
			print("Cost after iteration {}: {}".format(i, cost))
			costs.append(cost)
		#backward propagation
		grads = L_model_backward(AL, Y, caches)
		#update parameters
		parameters = update_parameters(parameters, grads, learning_rate)
	plt.clf()
	plt.plot(costs)  # o-:圆形
	plt.xlabel("iterations")  # 横坐标名字
	plt.ylabel("cost")  # 纵坐标名字
	plt.show()
	return parameters

#predict function
def predict(X,y,parameters):
	"""
	:param X:
	:param y:
	:param parameters:
	:return:
	"""
	m = y.shape[1]
	Y_prediction = np.zeros((1, m))
	prob, caches = L_model_forward(X,parameters)
	for i in range(prob.shape[1]):
		# Convert probabilities A[0,i] to actual predictions p[0,i]
		### START CODE HERE ### (≈ 4 lines of code)
		if prob[0, i] > 0.5:
			Y_prediction[0, i] = 1
		else:
			Y_prediction[0, i] = 0
	accuracy = 1- np.mean(np.abs(Y_prediction - y))
	return accuracy
#DNN model
def DNN(X_train, y_train, X_test, y_test, layer_dims, learning_rate= 0.01, num_iterations=15000):
	parameters = L_layer_model(X_train, y_train, layer_dims, learning_rate, num_iterations)
	print('===========================测试集=================================')
	accuracy = predict(X_test,y_test,parameters)
	return accuracy

if __name__ == "__main__":
	X_train, y_train, X_test, y_test = load_dataset()
	accuracy = DNN(X_train, y_train, X_test, y_test, [X_train.shape[0], 10, 5, 1])
	print(accuracy)

================================================
FILE: deep_neural_network_release.py
================================================
"""
把各部分分离出来，降低耦合度，使得结构更加清晰
"""

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import  load_breast_cancer
from sklearn.model_selection import train_test_split



#initialize parameters(w,b)
def initialize_parameters(layer_dims):
	"""
	:param layer_dims: list,每一层单元的个数（维度）
	:return:dictionary,存储参数w1,w2,...,wL,b1,...,bL
	"""
	np.random.seed(3)
	L = len(layer_dims)#the number of layers in the network
	parameters = {}
	for l in range(1,L):
		parameters["W" + str(l)] = np.random.randn(layer_dims[l],layer_dims[l-1])*0.1
		# parameters["W" + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1])*np.sqrt(2/layer_dims[l-1]) # he initialization
		# parameters["W" + str(l)] = np.zeros((layer_dims[l], layer_dims[l - 1])) #为了测试初始化为0的后果
		# parameters["W" + str(l)] = np.random.randn(layer_dims[l], layer_dims[l - 1]) * np.sqrt(1 / layer_dims[l - 1])  # xavier initialization
		parameters["b" + str(l)] = np.zeros((layer_dims[l],1))
	return parameters

def linear_forward(x, w, b):
	"""
	:param x:
	:param w:
	:param b:
	:return:
	"""
	z = np.dot(w, x) + b  # 计算z = wx + b
	return z

def relu_forward(Z):
	"""
	:param Z: Output of the activation layer
	:return:
	A: output of activation
	"""
	A = np.maximum(0,Z)
	return A

#implement the activation function(ReLU and sigmoid)
def sigmoid(Z):
	"""
	:param Z: Output of the linear layer
	:return:
	"""
	A = 1 / (1 + np.exp(-Z))
	return A

def forward_propagation(X, parameters):
	"""
	X -- input dataset, of shape (input size, number of examples)
    parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2",...,"WL", "bL"
                    W -- weight matrix of shape (size of current layer, size of previous layer)
                    b -- bias vector of shape (size of current layer,1)
    :return:
	AL: the output of the last Layer(y_predict)
	caches: list, every element is a tuple:(W,b,z,A_pre)
	"""
	L = len(parameters) // 2  # number of layer
	A = X
	caches = []
	# calculate from 1 to L-1 layer
	for l in range(1,L):
		W = parameters["W" + str(l)]
		b = parameters["b" + str(l)]
		#linear forward -> relu forward ->linear forward....
		z = linear_forward(A, W, b)
		caches.append((A, W, b, z))  # 以激活函数为分割，到z认为是这一层的，激活函数的输出值A认为是下一层的输入，划归到下一层。注意cache的位置，要放在relu前面。
		A = relu_forward(z) #relu activation function
	# calculate Lth layer
	WL = parameters["W" + str(L)]
	bL = parameters["b" + str(L)]
	zL = linear_forward(A, WL, bL)
	caches.append((A, WL, bL, zL))
	AL = sigmoid(zL)
	return AL, caches

#calculate cost function
def compute_cost(AL,Y):
	"""
	:param AL: 最后一层的激活值，即预测值，shape:(1,number of examples)
	:param Y:真实值,shape:(1, number of examples)
	:return:
	"""
	m = Y.shape[1]
	# cost = -1.0/m * np.sum(Y*np.log(AL)+(1-Y)*np.log(1.0 - AL))#py中*是点乘
	# cost = (1. / m) * (-np.dot(Y, np.log(AL).T) - np.dot(1 - Y, np.log(1 - AL).T)) #推荐用这个，上面那个容易出错
	cost = 1. / m * np.nansum(np.multiply(-np.log(AL), Y) +
	                          np.multiply(-np.log(1 - AL), 1 - Y))
	#从数组的形状中删除单维条目，即把shape中为1的维度去掉，比如把[[[2]]]变成2
	cost = np.squeeze(cost)
	# print('=====================cost===================')
	# print(cost)
	return cost


#derivation of relu
def relu_backward(dA, Z):
	"""
	:param Z: the input of activation function
	:param dA:
	:return:
	"""
	dout = np.multiply(dA, np.int64(Z > 0)) #J对z的求导
	return dout

#derivation of linear
def linear_backward(dZ, cache):
	"""
	:param dZ: Upstream derivative, the shape (n^[l+1],m)
	:param A: input of this layer
	:return:
	"""
	A, W, b, z = cache
	dW = np.dot(dZ, A.T)
	db = np.sum(dZ, axis=1, keepdims=True)
	da = np.dot(W.T, dZ)
	return da, dW, db


def backward_propagation(AL, Y, caches):
	"""
	Implement the backward propagation presented in figure 2.
	Arguments:
	X -- input dataset, of shape (input size, number of examples)
	Y -- true "label" vector (containing 0 if cat, 1 if non-cat)
	caches -- caches output from forward_propagation(),(W,b,z,pre_A)

	Returns:
	gradients -- A dictionary with the gradients with respect to dW,db
	"""
	m = Y.shape[1]
	L = len(caches) - 1
	#calculate the Lth layer gradients
	dz = 1. / m * (AL - Y)
	da, dWL, dbL = linear_backward(dz, caches[L])
	gradients = {"dW" + str(L + 1): dWL, "db" + str(L + 1): dbL}

	#calculate from L-1 to 1 layer gradients
	for l in reversed(range(0,L)): # L-1,L-3,....,0
		A, W, b, z = caches[l]
		#ReLu backward -> linear backward
		#relu backward
		dout = relu_backward(da, z)
		#linear backward
		da, dW, db = linear_backward(dout, caches[l])
		# print("========dW" + str(l+1) + "================")
		# print(dW.shape)
		gradients["dW" + str(l+1)] = dW
		gradients["db" + str(l+1)] = db
	return gradients

def update_parameters(parameters, grads, learning_rate):
	"""
	:param parameters: dictionary,  W,b
	:param grads: dW,db
	:param learning_rate: alpha
	:return:
	"""
	L = len(parameters) // 2
	for l in range(L):
		parameters["W" + str(l + 1)] = parameters["W" + str(l + 1)] - learning_rate * grads["dW" + str(l+1)]
		parameters["b" + str(l + 1)] = parameters["b" + str(l + 1)] - learning_rate * grads["db" + str(l+1)]
	return parameters

def L_layer_model(X, Y, layer_dims, learning_rate, num_iterations):
	"""
	:param X:
	:param Y:
	:param layer_dims:list containing the input size and each layer size
	:param learning_rate:
	:param num_iterations:
	:return:
	parameters：final parameters:(W,b)
	"""
	costs = []
	# initialize parameters
	parameters = initialize_parameters(layer_dims)
	for i in range(0, num_iterations):
		#foward propagation
		AL,caches = forward_propagation(X, parameters)
		# calculate the cost
		cost = compute_cost(AL, Y)
		if i % 1000 == 0:
			print("Cost after iteration {}: {}".format(i, cost))
			costs.append(cost)
		#backward propagation
		grads = backward_propagation(AL, Y, caches)
		#update parameters
		parameters = update_parameters(parameters, grads, learning_rate)
	print('length of cost')
	print(len(costs))
	plt.clf()
	plt.plot(costs)  # o-:圆形
	plt.xlabel("iterations(thousand)")  # 横坐标名字
	plt.ylabel("cost")  # 纵坐标名字
	plt.show()
	return parameters

#predict function
def predict(X_test,y_test,parameters):
	"""
	:param X:
	:param y:
	:param parameters:
	:return:
	"""
	m = y_test.shape[1]
	Y_prediction = np.zeros((1, m))
	prob, caches = forward_propagation(X_test,parameters)
	for i in range(prob.shape[1]):
		# Convert probabilities A[0,i] to actual predictions p[0,i]
		if prob[0, i] > 0.5:
			Y_prediction[0, i] = 1
		else:
			Y_prediction[0, i] = 0
	accuracy = 1- np.mean(np.abs(Y_prediction - y_test))
	return accuracy

#DNN model
def DNN(X_train, y_train, X_test, y_test, layer_dims, learning_rate= 0.001, num_iterations=30000):
	parameters = L_layer_model(X_train, y_train, layer_dims, learning_rate, num_iterations)
	accuracy = predict(X_test,y_test,parameters)
	return accuracy

if __name__ == "__main__":
	X_data, y_data = load_breast_cancer(return_X_y=True)
	X_train, X_test,y_train,y_test = train_test_split(X_data, y_data, train_size=0.8,random_state=28)
	X_train = X_train.T
	y_train = y_train.reshape(y_train.shape[0], -1).T
	X_test = X_test.T
	y_test = y_test.reshape(y_test.shape[0], -1).T
	accuracy = DNN(X_train,y_train,X_test,y_test,[X_train.shape[0],10,5,1])
	print(accuracy)

================================================
FILE: deep_neural_network_v1.py
================================================
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import  load_breast_cancer
from sklearn.model_selection import train_test_split
#initialize parameters(w,b)
def initialize_parameters(layer_dims):
	"""
	:param layer_dims: list,每一层单元的个数（维度）
	:return:dictionary,存储参数w1,w2,...,wL,b1,...,bL
	"""
	np.random.seed(3)
	L = len(layer_dims)#the number of layers in the network
	parameters = {}
	for l in range(1,L):
		parameters["W" + str(l)] = np.random.randn(layer_dims[l],layer_dims[l-1])*0.01
		parameters["b" + str(l)] = np.zeros((layer_dims[l],1))
	return parameters
def relu(Z):
	"""
	:param Z: Output of the linear layer
	:return:
	A: output of activation
	"""
	A = np.maximum(0,Z)
	return A
#implement the activation function(ReLU and sigmoid)
def sigmoid(Z):
	"""
	:param Z: Output of the linear layer
	:return:
	"""
	A = 1 / (1 + np.exp(-Z))
	return A

def forward_propagation(X, parameters):
	"""
	X -- input dataset, of shape (input size, number of examples)
    parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2",...,"WL", "bL"
                    W -- weight matrix of shape (size of current layer, size of previous layer)
                    b -- bias vector of shape (size of current layer,1)
    :return:
	AL: the output of the last Layer(y_predict)
	caches: list, every element is a tuple:(W,b,z,A_pre)
	"""
	L = len(parameters) // 2  # number of layer
	A = X
	caches = []  # 用于存储每一层的，w,b,z,A
	# calculate from 1 to L-1 layer
	for l in range(1,L):
		A_pre = A
		W = parameters["W" + str(l)]
		b = parameters["b" + str(l)]
		z = np.dot(W,A_pre) + b #计算z = wx + b
		A = relu(z) #relu activation function
		caches.append((W,b,z,A_pre))
	# calculate Lth layer
	WL = parameters["W" + str(L)]
	bL = parameters["b" + str(L)]
	zL = np.dot(WL,A) + bL
	AL = sigmoid(zL)
	caches.append((WL,bL,zL,A))
	return AL, caches
#calculate cost function
def compute_cost(AL,Y):
	"""
	:param AL: 最后一层的激活值，即预测值，shape:(1,number of examples)
	:param Y:真实值,shape:(1, number of examples)
	:return:
	"""
	m = Y.shape[1]
	# cost = -1.0/m * np.sum(Y*np.log(AL)+(1-Y)*np.log(1.0 - AL))#py中*是点乘
	# cost = (1. / m) * (-np.dot(Y, np.log(AL).T) - np.dot(1 - Y, np.log(1 - AL).T)) #推荐用这个，上面那个容易出错
	cost = 1. / m * np.nansum(np.multiply(-np.log(AL), Y) +
	                          np.multiply(-np.log(1 - AL), 1 - Y))
	#从数组的形状中删除单维条目，即把shape中为1的维度去掉，比如把[[[2]]]变成2
	cost = np.squeeze(cost)
	return cost

# derivation of relu
def relu_backward(Z):
	"""
	:param Z: the input of activation
	:return:
	"""
	dA = np.int64(Z > 0)
	return dA

def backward_propagation(AL, Y, caches):
	"""
	Implement the backward propagation presented in figure 2.
	Arguments:
	X -- input dataset, of shape (input size, number of examples)
	Y -- true "label" vector (containing 0 if cat, 1 if non-cat)
	caches -- caches output from forward_propagation(),(W,b,z,pre_A)

	Returns:
	gradients -- A dictionary with the gradients with respect to dW,db
	"""
	m = Y.shape[1]
	L = len(caches)
	# print("L:   " + str(L))
	#calculate the Lth layer gradients
	prev_AL = caches[L-1][3]
	dzL = 1./m * (AL - Y)
	# print(dzL.shape)
	# print(prev_AL.T.shape)
	dWL = np.dot(dzL, prev_AL.T)
	dbL = np.sum(dzL, axis=1, keepdims=True)
	gradients = {"dW"+str(L):dWL, "db"+str(L):dbL}
	#calculate from L-1 to 1 layer gradients
	for l in reversed(range(0,L-1)):
		post_W= caches[l+1][0] #要用后一层的W
		dz = dzL #用后一层的dz

		dal = np.dot(post_W.T, dz)
		z = caches[l+1][2]#当前层的z
		dzl = np.multiply(dal, relu_backward(z))#可以直接用dzl = np.multiply(dal, np.int64(z > 0))来实现
		prev_A = caches[l][3]#前一层的A
		dWl = np.dot(dzl, prev_A.T)
		dbl = np.sum(dzl, axis=1, keepdims=True)

		gradients["dW" + str(l+1)] = dWl
		gradients["db" + str(l+1)] = dbl
		dzL = dzl #更新dz
	return gradients

def update_parameters(parameters, grads, learning_rate):
	"""
	:param parameters: dictionary,  W,b
	:param grads: dW,db
	:param learning_rate: alpha
	:return:
	"""
	L = len(parameters) // 2
	for l in range(L):
		parameters["W" + str(l + 1)] = parameters["W" + str(l + 1)] - learning_rate * grads["dW" + str(l+1)]
		parameters["b" + str(l + 1)] = parameters["b" + str(l + 1)] - learning_rate * grads["db" + str(l+1)]
	return parameters

def L_layer_model(X, Y, layer_dims, learning_rate, num_iterations):
	"""
	:param X:
	:param Y:
	:param layer_dims:list containing the input size and each layer size
	:param learning_rate:
	:param num_iterations:
	:return:
	parameters：final parameters:(W,b)
	"""
	costs = []
	# initialize parameters
	parameters = initialize_parameters(layer_dims)
	for i in range(0, num_iterations):
		#foward propagation
		AL,caches = forward_propagation(X, parameters)
		# calculate the cost
		cost = compute_cost(AL, Y)
		if i % 1000 == 0:
			print("Cost after iteration {}: {}".format(i, cost))
			costs.append(cost)
		#backward propagation
		grads = backward_propagation(AL, Y, caches)
		#update parameters
		parameters = update_parameters(parameters, grads, learning_rate)
	plt.clf()
	plt.plot(costs)
	plt.xlabel("iterations(thousand)")  # 横坐标名字
	plt.ylabel("cost")  # 纵坐标名字
	plt.show()
	return parameters

#predict function
def predict(X_test,y_test,parameters):
	"""
	:param X:
	:param y:
	:param parameters:
	:return:
	"""
	m = y_test.shape[1]
	Y_prediction = np.zeros((1, m))
	prob, caches = forward_propagation(X_test,parameters)
	for i in range(prob.shape[1]):
		# Convert probabilities A[0,i] to actual predictions p[0,i]
		if prob[0, i] > 0.5:
			Y_prediction[0, i] = 1
		else:
			Y_prediction[0, i] = 0
	accuracy = 1- np.mean(np.abs(Y_prediction - y_test))
	return accuracy
#DNN model
def DNN(X_train, y_train, X_test, y_test, layer_dims, learning_rate= 0.01, num_iterations=15000):
	parameters = L_layer_model(X_train, y_train, layer_dims, learning_rate, num_iterations)
	accuracy = predict(X_test,y_test,parameters)
	return accuracy
if __name__ == "__main__":
	X_data, y_data = load_breast_cancer(return_X_y=True)
	X_train, X_test,y_train,y_test = train_test_split(X_data, y_data, train_size=0.8,random_state=28)
	X_train = X_train.T
	y_train = y_train.reshape(y_train.shape[0], -1).T
	X_test = X_test.T
	y_test = y_test.reshape(y_test.shape[0], -1).T
	accuracy = DNN(X_train,y_train,X_test,y_test,[X_train.shape[0],20, 10, 5, 1])
	print(accuracy)

================================================
FILE: deep_neural_network_v2.py
================================================
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import  load_breast_cancer
from sklearn.model_selection import train_test_split
#initialize parameters(w,b)
def initialize_parameters(layer_dims):
	"""
	:param layer_dims: list,每一层单元的个数（维度）
	:return:dictionary,存储参数w1,w2,...,wL,b1,...,bL
	"""
	np.random.seed(3)
	L = len(layer_dims)#the number of layers in the network
	parameters = {}
	for l in range(1,L):
		parameters["W" + str(l)] = np.random.randn(layer_dims[l],layer_dims[l-1])*0.01
		# parameters["W" + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1])*np.sqrt(2/layer_dims[l-1]) # he initialization
		# parameters["W" + str(l)] = np.zeros((layer_dims[l], layer_dims[l - 1])) #为了测试初始化为0的后果
		# parameters["W" + str(l)] = np.random.randn(layer_dims[l], layer_dims[l - 1]) * np.sqrt(1 / layer_dims[l - 1])  # xavier initialization
		parameters["b" + str(l)] = np.zeros((layer_dims[l],1))
	return parameters
def relu(Z):
	"""
	:param Z: Output of the linear layer
	:return:
	A: output of activation
	"""
	A = np.maximum(0,Z)
	return A
#implement the activation function(ReLU and sigmoid)
def sigmoid(Z):
	"""
	:param Z: Output of the linear layer
	:return:
	"""
	A = 1 / (1 + np.exp(-Z))
	return A

def forward_propagation(X, parameters):
	"""
	X -- input dataset, of shape (input size, number of examples)
    parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2",...,"WL", "bL"
                    W -- weight matrix of shape (size of current layer, size of previous layer)
                    b -- bias vector of shape (size of current layer,1)
    :return:
	AL: the output of the last Layer(y_predict)
	caches: list, every element is a tuple:(W,b,z,A_pre)
	"""
	L = len(parameters) // 2  # number of layer
	A = X
	caches = [(None,None,None,X)]  # 第0层(None,None,None,A0) w,b,z用none填充,下标与层数一致，用于存储每一层的，w,b,z,A
	# calculate from 1 to L-1 layer
	for l in range(1,L):
		A_pre = A
		W = parameters["W" + str(l)]
		b = parameters["b" + str(l)]
		z = np.dot(W,A_pre) + b #计算z = wx + b
		A = relu(z) #relu activation function
		caches.append((W,b,z,A))
	# calculate Lth layer
	WL = parameters["W" + str(L)]
	bL = parameters["b" + str(L)]
	zL = np.dot(WL,A) + bL
	AL = sigmoid(zL)
	caches.append((WL,bL,zL,AL))
	return AL, caches
#calculate cost function
def compute_cost(AL,Y):
	"""
	:param AL: 最后一层的激活值，即预测值，shape:(1,number of examples)
	:param Y:真实值,shape:(1, number of examples)
	:return:
	"""
	m = Y.shape[1]
	# cost = -1.0/m * np.sum(Y*np.log(AL)+(1-Y)*np.log(1.0 - AL))#py中*是点乘
	# cost = (1. / m) * (-np.dot(Y, np.log(AL).T) - np.dot(1 - Y, np.log(1 - AL).T)) #推荐用这个，上面那个容易出错
	cost = 1. / m * np.nansum(np.multiply(-np.log(AL), Y) +
	                          np.multiply(-np.log(1 - AL), 1 - Y))
	#从数组的形状中删除单维条目，即把shape中为1的维度去掉，比如把[[[2]]]变成2
	cost = np.squeeze(cost)
	# print('=====================cost===================')
	# print(cost)
	return cost

# derivation of relu
def relu_backward(Z):
	"""
	:param Z: the input of activation
	:return:
	"""
	dA = np.int64(Z > 0)
	return dA

def backward_propagation(AL, Y, caches):
	"""
	Implement the backward propagation presented in figure 2.
	Arguments:
	X -- input dataset, of shape (input size, number of examples)
	Y -- true "label" vector (containing 0 if cat, 1 if non-cat)
	caches -- caches output from forward_propagation(),(W,b,z,pre_A)

	Returns:
	gradients -- A dictionary with the gradients with respect to dW,db
	"""
	m = Y.shape[1]
	L = len(caches) - 1
	# print("L:   " + str(L))
	#calculate the Lth layer gradients
	prev_AL = caches[L-1][3]
	dzL = 1./m * (AL - Y)
	# print(dzL.shape)
	# print(prev_AL.T.shape)
	dWL = np.dot(dzL, prev_AL.T)
	dbL = np.sum(dzL, axis=1, keepdims=True)
	gradients = {"dW"+str(L):dWL, "db"+str(L):dbL}
	#calculate from L-1 to 1 layer gradients
	for l in reversed(range(1,L)): # L-1,L-3,....,1
		post_W= caches[l+1][0] #要用后一层的W
		dz = dzL #用后一层的dz

		dal = np.dot(post_W.T, dz)
		Z = caches[l][2]#当前层的Z
		dzl = np.multiply(dal, relu_backward(Z))#可以直接用dzl = np.multiply(dal, np.int64(Z > 0))来实现
		prev_A = caches[l-1][3]#前一层的A
		dWl = np.dot(dzl, prev_A.T)
		dbl = np.sum(dzl, axis=1, keepdims=True)

		gradients["dW" + str(l)] = dWl
		gradients["db" + str(l)] = dbl
		dzL = dzl #更新dz
	return gradients

def update_parameters(parameters, grads, learning_rate):
	"""
	:param parameters: dictionary,  W,b
	:param grads: dW,db
	:param learning_rate: alpha
	:return:
	"""
	L = len(parameters) // 2
	for l in range(L):
		parameters["W" + str(l + 1)] = parameters["W" + str(l + 1)] - learning_rate * grads["dW" + str(l+1)]
		parameters["b" + str(l + 1)] = parameters["b" + str(l + 1)] - learning_rate * grads["db" + str(l+1)]
	return parameters

def L_layer_model(X, Y, layer_dims, learning_rate, num_iterations):
	"""
	:param X:
	:param Y:
	:param layer_dims:list containing the input size and each layer size
	:param learning_rate:
	:param num_iterations:
	:return:
	parameters：final parameters:(W,b)
	"""
	costs = []
	# initialize parameters
	parameters = initialize_parameters(layer_dims)
	for i in range(0, num_iterations):
		#foward propagation
		AL,caches = forward_propagation(X, parameters)
		# calculate the cost
		cost = compute_cost(AL, Y)
		if i % 1000 == 0:
			print("Cost after iteration {}: {}".format(i, cost))
			costs.append(cost)
		#backward propagation
		grads = backward_propagation(AL, Y, caches)
		#update parameters
		parameters = update_parameters(parameters, grads, learning_rate)
	print('length of cost')
	print(len(costs))
	plt.clf()
	plt.plot(costs)  # o-:圆形
	plt.xlabel("iterations(thousand)")  # 横坐标名字
	plt.ylabel("cost")  # 纵坐标名字
	plt.show()
	return parameters

#predict function
def predict(X_test,y_test,parameters):
	"""
	:param X:
	:param y:
	:param parameters:
	:return:
	"""
	m = y_test.shape[1]
	Y_prediction = np.zeros((1, m))
	prob, caches = forward_propagation(X_test,parameters)
	for i in range(prob.shape[1]):
		# Convert probabilities A[0,i] to actual predictions p[0,i]
		if prob[0, i] > 0.5:
			Y_prediction[0, i] = 1
		else:
			Y_prediction[0, i] = 0
	accuracy = 1- np.mean(np.abs(Y_prediction - y_test))
	return accuracy
#DNN model
def DNN(X_train, y_train, X_test, y_test, layer_dims, learning_rate= 0.01, num_iterations=15000):
	parameters = L_layer_model(X_train, y_train, layer_dims, learning_rate, num_iterations)
	accuracy = predict(X_test,y_test,parameters)
	return accuracy
if __name__ == "__main__":
	X_data, y_data = load_breast_cancer(return_X_y=True)
	X_train, X_test,y_train,y_test = train_test_split(X_data, y_data, train_size=0.8,random_state=28)
	X_train = X_train.T
	y_train = y_train.reshape(y_train.shape[0], -1).T
	X_test = X_test.T
	y_test = y_test.reshape(y_test.shape[0], -1).T
	accuracy = DNN(X_train,y_train,X_test,y_test,[X_train.shape[0],20, 10, 5, 1])
	print(accuracy)

================================================
FILE: deep_neural_network_with_L2.py
================================================
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import  load_breast_cancer
from sklearn.model_selection import train_test_split

#initialize parameters(w,b)
def initialize_parameters(layer_dims):
	"""
	:param layer_dims: list,每一层单元的个数（维度）
	:return:dictionary,存储参数w1,w2,...,wL,b1,...,bL
	"""
	np.random.seed(3)
	L = len(layer_dims)#the number of layers in the network
	parameters = {}
	for l in range(1,L):
		# parameters["W" + str(l)] = np.random.randn(layer_dims[l],layer_dims[l-1])*0.01
		# parameters["W" + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1])*np.sqrt(2/layer_dims[l-1]) # he initialization
		# parameters["W" + str(l)] = np.zeros((layer_dims[l], layer_dims[l - 1])) #为了测试初始化为0的后果
		parameters["W" + str(l)] = np.random.randn(layer_dims[l], layer_dims[l - 1]) * np.sqrt(1 / layer_dims[l - 1])  # xavier initialization
		parameters["b" + str(l)] = np.zeros((layer_dims[l],1))
	return parameters

def relu(Z):
	"""
	:param Z: Output of the linear layer
	:return:
	A: output of activation
	"""
	A = np.maximum(0,Z)
	return A
#implement the activation function(ReLU and sigmoid)
def sigmoid(Z):
	"""
	:param Z: Output of the linear layer
	:return:
	"""
	A = 1 / (1 + np.exp(-Z))
	return A

def forward_propagation(X, parameters):
	"""
	X -- input dataset, of shape (input size, number of examples)
    parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2",...,"WL", "bL"
                    W -- weight matrix of shape (size of current layer, size of previous layer)
                    b -- bias vector of shape (size of current layer,1)
    :return:
	AL: the output of the last Layer(y_predict)
	caches: list, every element is a tuple:(W,b,z,A_pre)
	"""
	L = len(parameters) // 2  # number of layer
	A = X
	caches = [(None,None,None,X)]  # 第0层(None,None,None,A0) w,b,z用none填充,下标与层数一致，用于存储每一层的，w,b,z,A
	# calculate from 1 to L-1 layer
	for l in range(1,L):
		A_pre = A
		W = parameters["W" + str(l)]
		b = parameters["b" + str(l)]
		z = np.dot(W,A_pre) + b #计算z = wx + b
		A = relu(z) #relu activation function
		caches.append((W,b,z,A))
	# calculate Lth layer
	WL = parameters["W" + str(L)]
	bL = parameters["b" + str(L)]
	zL = np.dot(WL,A) + bL
	AL = sigmoid(zL)
	caches.append((WL,bL,zL,AL))
	return AL, caches
#calculate cost function

def compute_cost(AL,Y):
	"""
	:param AL: 最后一层的激活值，即预测值，shape:(1,number of examples)
	:param Y:真实值,shape:(1, number of examples)
	:return:
	"""
	m = Y.shape[1]
	cost = 1. / m * np.nansum(np.multiply(-np.log(AL), Y) + np.multiply(-np.log(1 - AL), 1 - Y))
	#从数组的形状中删除单维条目，即把shape中为1的维度去掉，比如把[[[2]]]变成2
	cost = np.squeeze(cost)
	return cost


def compute_cost_with_regularization(AL, Y, parameters, lambd):
	"""
	Implement the cost function with L2 regularization. See formula (2) above.
	Arguments:
	A3 -- post-activation, output of forward propagation, of shape (output size, number of examples)
	Y -- "true" labels vector, of shape (output size, number of examples)
	parameters -- python dictionary containing parameters of the model
	Returns:
	cost - value of the regularized loss function
	"""
	m = Y.shape[1]
	cross_entropy_cost = compute_cost(AL, Y)  # This gives you the cross-entropy part of the cost
	L = len(parameters) // 2
	L2_regularization_cost = 0
	for l in range(0,L):
		L2_regularization_cost += (1. / m) * (lambd / 2.) * (np.sum(np.square(parameters["W" + str(l+1)])))
	cost = cross_entropy_cost + L2_regularization_cost

	return cost


# derivation of relu
def relu_backward(Z):
	"""
	:param Z: the input of activation
	:return:
	"""
	dA = np.int64(Z > 0)
	return dA

def backward_propagation_with_regularization(AL, Y, caches, lambd):
	"""
	Implement the backward propagation presented in figure 2.
	Arguments:
	AL: the output of last layer , i.e predict
	Y -- true "label" vector (containing 0 if cat, 1 if non-cat)
	caches -- caches output from forward_propagation(),(W,b,z,A)

	Returns:
	gradients -- A dictionary with the gradients with respect to dW,db
	"""
	m = Y.shape[1]
	L = len(caches) - 1
	# print("L:   " + str(L))
	#calculate the Lth layer gradients
	prev_AL = caches[L-1][3]
	dzL = 1./m * (AL - Y)
	# print(dzL.shape)
	# print(prev_AL.T.shape)
	dWL = np.dot(dzL, prev_AL.T) + lambd/m * caches[L][0]
	dbL = np.sum(dzL, axis=1, keepdims=True)
	gradients = {"dW" + str(L): dWL, "db" + str(L): dbL}
	#calculate from L-1 to 1 layer gradients
	for l in reversed(range(1,L)): # L-1,L-3,....,1
		post_W= caches[l+1][0] #要用后一层的W
		dz = dzL #用后一层的dz
		dal = np.dot(post_W.T, dz)
		z = caches[l][2]#当前层的z
		dzl = np.multiply(dal, relu_backward(z))#可以直接用dzl = np.multiply(dal, np.int64(Al > 0))来实现
		prev_A = caches[l-1][3]#前一层的A
		dWl = np.dot(dzl, prev_A.T) + lambd/m * caches[l][0]
		dbl = np.sum(dzl, axis=1, keepdims=True)

		gradients["dW" + str(l)] = dWl
		gradients["db" + str(l)] = dbl
		dzL = dzl #更新dz
	return gradients


def update_parameters(parameters, grads, learning_rate):
	"""
	:param parameters: dictionary,  W,b
	:param grads: dW,db
	:param learning_rate: alpha
	:return:
	"""
	L = len(parameters) // 2
	for l in range(L):
		parameters["W" + str(l + 1)] = parameters["W" + str(l + 1)] - learning_rate * grads["dW" + str(l+1)]
		parameters["b" + str(l + 1)] = parameters["b" + str(l + 1)] - learning_rate * grads["db" + str(l+1)]
	return parameters

def L_layer_model(X, Y, layer_dims, learning_rate, num_iterations,lambd):
	"""
	:param X:
	:param Y:
	:param layer_dims:list containing the input size and each layer size
	:param learning_rate:
	:param num_iterations:
	:return:
	parameters：final parameters:(W,b)
	"""
	costs = []
	# initialize parameters
	parameters = initialize_parameters(layer_dims)
	for i in range(0, num_iterations):
		#foward propagation
		AL,caches = forward_propagation(X, parameters)
		# calculate the cost
		cost = compute_cost_with_regularization(AL, Y, parameters, lambd)
		if i % 1000 == 0:
			print("Cost after iteration {}: {}".format(i, cost))
			costs.append(cost)
		#backward propagation
		grads = backward_propagation_with_regularization(AL, Y, caches, lambd)
		#update parameters
		parameters = update_parameters(parameters, grads, learning_rate)
	print('length of cost')
	print(len(costs))
	plt.clf()
	plt.plot(costs)  # o-:圆形
	plt.xlabel("iterations(thousand)")  # 横坐标名字
	plt.ylabel("cost")  # 纵坐标名字
	plt.show()
	return parameters

#predict function
def predict(X_test,y_test,parameters):
	"""
	:param X:
	:param y:
	:param parameters:
	:return:
	"""
	m = y_test.shape[1]
	Y_prediction = np.zeros((1, m))
	prob, caches = forward_propagation(X_test,parameters)
	for i in range(prob.shape[1]):
		# Convert probabilities A[0,i] to actual predictions p[0,i]
		if prob[0, i] > 0.5:
			Y_prediction[0, i] = 1
		else:
			Y_prediction[0, i] = 0
	accuracy = 1- np.mean(np.abs(Y_prediction - y_test))
	return accuracy
#DNN model
def DNN(X_train, y_train, X_test, y_test, layer_dims, learning_rate= 0.001, num_iterations=20000, lambd = 0.):
	parameters = L_layer_model(X_train, y_train, layer_dims, learning_rate, num_iterations,lambd)
	accuracy = predict(X_test,y_test,parameters)
	return accuracy
if __name__ == "__main__":
	X_data, y_data = load_breast_cancer(return_X_y=True)
	X_train, X_test,y_train,y_test = train_test_split(X_data, y_data, train_size=0.8,random_state=28)
	X_train = X_train.T
	y_train = y_train.reshape(y_train.shape[0], -1).T
	X_test = X_test.T
	y_test = y_test.reshape(y_test.shape[0], -1).T
	# X_train, y_train, X_test, y_test = load_2D_dataset()
	accuracy = DNN(X_train,y_train,X_test,y_test,[X_train.shape[0],10,5,1],lambd = 0.7)
	print(accuracy)

	# A3, Y_assess, parameters = compute_cost_with_regularization_test_case()
	# print("cost = " + str(compute_cost_with_regularization(A3, Y_assess, parameters, lambd=0.1)))

================================================
FILE: deep_neural_network_with_dropout.py
================================================
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import  load_breast_cancer
from sklearn.model_selection import train_test_split


#initialize parameters(w,b)
def initialize_parameters(layer_dims):
	"""
	:param layer_dims: list,每一层单元的个数（维度）
	:return:dictionary,存储参数w1,w2,...,wL,b1,...,bL
	"""
	np.random.seed(3)
	L = len(layer_dims)#the number of layers in the network
	parameters = {}
	for l in range(1,L):
		# parameters["W" + str(l)] = np.random.randn(layer_dims[l],layer_dims[l-1])*0.01
		# parameters["W" + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1])*np.sqrt(2/layer_dims[l-1]) # he initialization
		# parameters["W" + str(l)] = np.zeros((layer_dims[l], layer_dims[l - 1])) #为了测试初始化为0的后果
		parameters["W" + str(l)] = np.random.randn(layer_dims[l], layer_dims[l - 1]) * np.sqrt(1 / layer_dims[l - 1])  # xavier initialization
		parameters["b" + str(l)] = np.zeros((layer_dims[l],1))
	return parameters

def relu(Z):
	"""
	:param Z: Output of the linear layer
	:return:
	A: output of activation
	"""
	A = np.maximum(0,Z)
	return A
#implement the activation function(ReLU and sigmoid)
def sigmoid(Z):
	"""
	:param Z: Output of the linear layer
	:return:
	"""
	A = 1 / (1 + np.exp(-Z))
	return A

def forward_propagation(X, parameters):
	"""
	X -- input dataset, of shape (input size, number of examples)
    parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2",...,"WL", "bL"
                    W -- weight matrix of shape (size of current layer, size of previous layer)
                    b -- bias vector of shape (size of current layer,1)
    :return:
	AL: the output of the last Layer(y_predict)
	caches: list, every element is a tuple:(W,b,z,A_pre)
	"""
	L = len(parameters) // 2  # number of layer
	A = X
	caches = [(None,None,None,X)]  # 第0层(None,None,None,A0) w,b,z用none填充,下标与层数一致，用于存储每一层的，w,b,z,A
	# calculate from 1 to L-1 layer
	for l in range(1,L):
		A_pre = A
		W = parameters["W" + str(l)]
		b = parameters["b" + str(l)]
		z = np.dot(W,A_pre) + b #计算z = wx + b
		A = relu(z) #relu activation function
		caches.append((W,b,z,A))
	# calculate Lth layer
	WL = parameters["W" + str(L)]
	bL = parameters["b" + str(L)]
	zL = np.dot(WL,A) + bL
	AL = sigmoid(zL)
	caches.append((WL,bL,zL,AL))
	return AL, caches


#带dropout的深度神经网络
def forward_propagation_with_dropout(X, parameters, keep_prob = 0.8):
	"""
	X -- input dataset, of shape (input size, number of examples)
    parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2",...,"WL", "bL"
                    W -- weight matrix of shape (size of current layer, size of previous layer)
                    b -- bias vector of shape (size of current layer,1)
    keep_prob: probability of keeping a neuron active during drop-out, scalar
    :return:
	AL: the output of the last Layer(y_predict)
	caches: list, every element is a tuple:(W,b,z,A_pre)
	"""
	np.random.seed(1)
	L = len(parameters) // 2  # number of layer
	A = X
	caches = [(None,None,None,X,None)]  #用于存储每一层的，w,b,z,A,D第0层w,b,z用none代替
	# calculate from 1 to L-1 layer
	for l in range(1, L):
		A_pre = A
		W = parameters["W" + str(l)]
		b = parameters["b" + str(l)]
		z = np.dot(W, A_pre) + b  # 计算z = wx + b
		A = relu(z)  # relu activation function
		D = np.random.rand(A.shape[0], A.shape[1]) #initialize matrix D
		D = (D < keep_prob) #convert entries of D to 0 or 1 (using keep_prob as the threshold)
		A = np.multiply(A, D) #shut down some neurons of A
		A = A / keep_prob #scale the value of neurons that haven't been shut down
		caches.append((W, b, z, A,D))
	# calculate Lth layer
	WL = parameters["W" + str(L)]
	bL = parameters["b" + str(L)]
	zL = np.dot(WL, A) + bL
	AL = sigmoid(zL)
	caches.append((WL, bL, zL, A))
	return AL, caches

#calculate cost function
def compute_cost(AL,Y):
	"""
	:param AL: 最后一层的激活值，即预测值，shape:(1,number of examples)
	:param Y:真实值,shape:(1, number of examples)
	:return:
	"""
	m = Y.shape[1]
	cost = 1. / m * np.nansum(np.multiply(-np.log(AL), Y) + np.multiply(-np.log(1 - AL), 1 - Y))
	#从数组的形状中删除单维条目，即把shape中为1的维度去掉，比如把[[[2]]]变成2
	cost = np.squeeze(cost)
	return cost

# derivation of relu
def relu_backward(Z):
	"""
	:param Z: the input of activation
	:return:
	"""
	dA = np.int64(Z > 0)
	return dA

#带dropout的bp
def backward_propagation_with_dropout(AL, Y, caches, keep_prob = 0.8):
	"""
		Implement the backward propagation presented in figure 2.
		Arguments:
		X -- input dataset, of shape (input size, number of examples)
		Y -- true "label" vector (containing 0 if cat, 1 if non-cat)
		caches -- caches output from forward_propagation(),(W,b,z,pre_A)

		Returns:
		gradients -- A dictionary with the gradients with respect to dW,db
		"""
	m = Y.shape[1]
	L = len(caches) - 1
	# print("L:   " + str(L))
	# calculate the Lth layer gradients
	prev_AL = caches[L - 1][3]
	dzL = 1. / m * (AL - Y)
	# print(dzL.shape)
	# print(prev_AL.T.shape)
	dWL = np.dot(dzL, prev_AL.T)
	dbL = np.sum(dzL, axis=1, keepdims=True)
	gradients = {"dW" + str(L): dWL, "db" + str(L): dbL}
	# calculate from L-1 to 1 layer gradients
	for l in reversed(range(1, L)): # L-1,L-2,...,1
		post_W = caches[l + 1][0]  # 要用后一层的W
		dz = dzL  # 用后一层的dz

		dal = np.dot(post_W.T, dz)
		Dl = caches[l][4] #当前层的D
		dal = np.multiply(dal, Dl)#Apply mask Dl to shut down the same neurons as during the forward propagation
		dal = dal / keep_prob #Scale the value of neurons that haven't been shut down
		z = caches[l][2]  # 当前层的Z
		dzl = np.multiply(dal, relu_backward(z))
		prev_A = caches[l-1][3]  # 前一层的A
		dWl = np.dot(dzl, prev_A.T)
		dbl = np.sum(dzl, axis=1, keepdims=True)

		gradients["dW" + str(l)] = dWl
		gradients["db" + str(l)] = dbl
		dzL = dzl  # 更新dz
	return gradients

def update_parameters(parameters, grads, learning_rate):
	"""
	:param parameters: dictionary,  W,b
	:param grads: dW,db
	:param learning_rate: alpha
	:return:
	"""
	L = len(parameters) // 2
	for l in range(L):
		parameters["W" + str(l + 1)] = parameters["W" + str(l + 1)] - learning_rate * grads["dW" + str(l+1)]
		parameters["b" + str(l + 1)] = parameters["b" + str(l + 1)] - learning_rate * grads["db" + str(l+1)]
	return parameters

def L_layer_model(X, Y, layer_dims, learning_rate, num_iterations,keep_prob):
	"""
	:param X:
	:param Y:
	:param layer_dims:list containing the input size and each layer size
	:param learning_rate:
	:param num_iterations:
	:return:
	parameters：final parameters:(W,b)
	"""
	costs = []
	# initialize parameters
	parameters = initialize_parameters(layer_dims)
	for i in range(0, num_iterations):
		#foward propagation
		AL,caches = forward_propagation_with_dropout(X, parameters, keep_prob)
		# calculate the cost
		cost = compute_cost(AL, Y)
		if i % 1000 == 0:
			print("Cost after iteration {}: {}".format(i, cost))
			costs.append(cost)
		#backward propagation
		grads = backward_propagation_with_dropout(AL, Y, caches, keep_prob)
		#update parameters
		parameters = update_parameters(parameters, grads, learning_rate)
	print('length of cost')
	print(len(costs))
	plt.clf()
	plt.plot(costs)  # o-:圆形
	plt.xlabel("iterations(thousand)")  # 横坐标名字
	plt.ylabel("cost")  # 纵坐标名字
	plt.show()
	return parameters

#predict function
def predict(X_test,y_test,parameters):
	"""
	:param X:
	:param y:
	:param parameters:
	:return:
	"""
	m = y_test.shape[1]
	Y_prediction = np.zeros((1, m))
	prob, caches = forward_propagation(X_test,parameters)
	for i in range(prob.shape[1]):
		# Convert probabilities A[0,i] to actual predictions p[0,i]
		### START CODE HERE ### (≈ 4 lines of code)
		if prob[0, i] > 0.5:
			Y_prediction[0, i] = 1
		else:
			Y_prediction[0, i] = 0
	accuracy = 1- np.mean(np.abs(Y_prediction - y_test))
	return accuracy
#DNN model
def DNN(X_train, y_train, X_test, y_test, layer_dims, learning_rate= 0.001, num_iterations=20000, keep_prob = 1.):
	parameters = L_layer_model(X_train, y_train, layer_dims, learning_rate, num_iterations, keep_prob)
	accuracy = predict(X_test,y_test,parameters)
	return accuracy
if __name__ == "__main__":
	X_data, y_data = load_breast_cancer(return_X_y=True)
	X_train, X_test,y_train,y_test = train_test_split(X_data, y_data, train_size=0.8,random_state=28)
	X_train = X_train.T
	y_train = y_train.reshape(y_train.shape[0], -1).T
	X_test = X_test.T
	y_test = y_test.reshape(y_test.shape[0], -1).T
	# X_train, y_train, X_test, y_test = load_2D_dataset()
	accuracy = DNN(X_train,y_train,X_test,y_test,[X_train.shape[0],10,5,1], keep_prob = 0.86)
	print(accuracy)

	# X_assess, parameters = forward_propagation_with_dropout_test_case()
	#
	# A3, cache = forward_propagation_with_dropout(X_assess, parameters, keep_prob=0.7)
	# print("A3 = " + str(A3))

================================================
FILE: deep_neural_network_with_gd.py
================================================
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import  load_breast_cancer
from sklearn.model_selection import train_test_split
#initialize parameters(w,b)
def initialize_parameters(layer_dims):
	"""
	:param layer_dims: list,每一层单元的个数（维度）
	:return:dictionary,存储参数w1,w2,...,wL,b1,...,bL
	"""
	np.random.seed(3)
	L = len(layer_dims)#the number of layers in the network
	parameters = {}
	for l in range(1,L):
		# parameters["W" + str(l)] = np.random.randn(layer_dims[l],layer_dims[l-1])*0.01
		parameters["W" + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1])*np.sqrt(2/layer_dims[l-1]) # he initialization
		# parameters["W" + str(l)] = np.zeros((layer_dims[l], layer_dims[l - 1])) #为了测试初始化为0的后果
		# parameters["W" + str(l)] = np.random.randn(layer_dims[l], layer_dims[l - 1]) * np.sqrt(1 / layer_dims[l - 1])  # xavier initialization
		parameters["b" + str(l)] = np.zeros((layer_dims[l],1))
	return parameters
def relu(Z):
	"""
	:param Z: Output of the linear layer
	:return:
	A: output of activation
	"""
	A = np.maximum(0,Z)
	return A
#implement the activation function(ReLU and sigmoid)
def sigmoid(Z):
	"""
	:param Z: Output of the linear layer
	:return:
	"""
	A = 1 / (1 + np.exp(-Z))
	return A

def forward_propagation(X, parameters):
	"""
	X -- input dataset, of shape (input size, number of examples)
    parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2",...,"WL", "bL"
                    W -- weight matrix of shape (size of current layer, size of previous layer)
                    b -- bias vector of shape (size of current layer,1)
    :return:
	AL: the output of the last Layer(y_predict)
	caches: list, every element is a tuple:(W,b,z,A_pre)
	"""
	L = len(parameters) // 2  # number of layer
	A = X
	caches = [(None,None,None,X)]  # 第0层(None,None,None,A0) w,b,z用none填充,下标与层数一致，用于存储每一层的，w,b,z,A
	# calculate from 1 to L-1 layer
	for l in range(1,L):
		A_pre = A
		W = parameters["W" + str(l)]
		b = parameters["b" + str(l)]
		z = np.dot(W,A_pre) + b #计算z = wx + b
		A = relu(z) #relu activation function
		caches.append((W,b,z,A))
	# calculate Lth layer
	WL = parameters["W" + str(L)]
	bL = parameters["b" + str(L)]
	zL = np.dot(WL,A) + bL
	AL = sigmoid(zL)
	caches.append((WL,bL,zL,AL))
	return AL, caches
#calculate cost function
def compute_cost(AL,Y):
	"""
	:param AL: 最后一层的激活值，即预测值，shape:(1,number of examples)
	:param Y:真实值,shape:(1, number of examples)
	:return:
	"""
	m = Y.shape[1]
	# cost = -1.0/m * np.sum(Y*np.log(AL)+(1-Y)*np.log(1.0 - AL))#py中*是点乘
	# cost = (1. / m) * (-np.dot(Y, np.log(AL).T) - np.dot(1 - Y, np.log(1 - AL).T)) #推荐用这个，上面那个容易出错
	cost = 1. / m * np.nansum(np.multiply(-np.log(AL), Y) +
	                          np.multiply(-np.log(1 - AL), 1 - Y))
	#从数组的形状中删除单维条目，即把shape中为1的维度去掉，比如把[[[2]]]变成2
	cost = np.squeeze(cost)
	# print('=====================cost===================')
	# print(cost)
	return cost
	
# derivation of relu
def relu_backward(Z):
	"""
	:param Z: the input of activation
	:return:
	"""
	dA = np.int64(Z > 0)
	return dA

def backward_propagation(AL, Y, caches):
	"""
	Implement the backward propagation presented in figure 2.
	Arguments:
	X -- input dataset, of shape (input size, number of examples)
	Y -- true "label" vector (containing 0 if cat, 1 if non-cat)
	caches -- caches output from forward_propagation(),(W,b,z,pre_A)

	Returns:
	gradients -- A dictionary with the gradients with respect to dW,db
	"""
	m = Y.shape[1]
	L = len(caches) - 1
	# print("L:   " + str(L))
	#calculate the Lth layer gradients
	prev_AL = caches[L-1][3]
	dzL = 1./m * (AL - Y)
	# print(dzL.shape)
	# print(prev_AL.T.shape)
	dWL = np.dot(dzL, prev_AL.T)
	dbL = np.sum(dzL, axis=1, keepdims=True)
	gradients = {"dW"+str(L):dWL, "db"+str(L):dbL}
	#calculate from L-1 to 1 layer gradients
	for l in reversed(range(1,L)): # L-1,L-3,....,1
		post_W= caches[l+1][0] #要用后一层的W
		dz = dzL #用后一层的dz

		dal = np.dot(post_W.T, dz)
		z = caches[l][2]#当前层的z
		dzl = np.multiply(dal, relu_backward(z))
		prev_A = caches[l-1][3]#前一层的A
		dWl = np.dot(dzl, prev_A.T)
		dbl = np.sum(dzl, axis=1, keepdims=True)

		gradients["dW" + str(l)] = dWl
		gradients["db" + str(l)] = dbl
		dzL = dzl #更新dz
	return gradients

def update_parameters(parameters, grads, learning_rate):
	"""
	:param parameters: dictionary,  W,b
	:param grads: dW,db
	:param learning_rate: alpha
	:return:
	"""
	L = len(parameters) // 2
	for l in range(L):
		parameters["W" + str(l + 1)] = parameters["W" + str(l + 1)] - learning_rate * grads["dW" + str(l+1)]
		parameters["b" + str(l + 1)] = parameters["b" + str(l + 1)] - learning_rate * grads["db" + str(l+1)]
	return parameters


def random_mini_batches(X, Y, mini_batch_size = 64, seed=1):
	"""
	Creates a list of random minibatches from (X, Y)
	Arguments:
	X -- input data, of shape (input size, number of examples)
	Y -- true "label" vector (1 for blue dot / 0 for red dot), of shape (1, number of examples)
	mini_batch_size -- size of the mini-batches, integer

	Returns:
	mini_batches -- list of synchronous (mini_batch_X, mini_batch_Y)
	"""
	np.random.seed(seed)
	m = X.shape[1]  # number of training examples
	mini_batches = []

	# Step 1: Shuffle (X, Y)
	permutation = list(np.random.permutation(m))
	shuffled_X = X[:, permutation]
	shuffled_Y = Y[:, permutation].reshape((1, m))

	# Step 2: Partition (shuffled_X, shuffled_Y). Minus the end case.
	num_complete_minibatches = m // mini_batch_size  # number of mini batches of size mini_batch_size in your partitionning
	for k in range(0, num_complete_minibatches):
		mini_batch_X = shuffled_X[:, k * mini_batch_size: (k + 1) * mini_batch_size]
		mini_batch_Y = shuffled_Y[:, k * mini_batch_size: (k + 1) * mini_batch_size]
		mini_batch = (mini_batch_X, mini_batch_Y)
		mini_batches.append(mini_batch)

	# Handling the end case (last mini-batch < mini_batch_size)
	if m % mini_batch_size != 0:
		mini_batch_X = shuffled_X[:, num_complete_minibatches * mini_batch_size: m]
		mini_batch_Y = shuffled_Y[:, num_complete_minibatches * mini_batch_size: m]
		mini_batch = (mini_batch_X, mini_batch_Y)
		mini_batches.append(mini_batch)

	return mini_batches

def L_layer_model(X, Y, layer_dims, learning_rate, num_iterations, gradient_descent = 'bgd',mini_batch_size = 64):
	"""
	:param X:
	:param Y:
	:param layer_dims:list containing the input size and each layer size
	:param learning_rate:
	:param num_iterations:
	:return:
	parameters：final parameters:(W,b)
	"""
	m = Y.shape[1]
	costs = []
	# initialize parameters
	parameters = initialize_parameters(layer_dims)
	if gradient_descent =='bgd':
		for i in range(0, num_iterations):
			#foward propagation
			AL,caches = forward_propagation(X, parameters)
			# calculate the cost
			cost = compute_cost(AL, Y)
			if i % 1000 == 0:
				print("Cost after iteration {}: {}".format(i, cost))
				costs.append(cost)
			#backward propagation
			grads = backward_propagation(AL, Y, caches)
			#update parameters
			parameters = update_parameters(parameters, grads, learning_rate)
	elif gradient_descent == 'sgd':
		np.random.seed(3)
		# 把数据集打乱，这个很重要
		permutation = list(np.random.permutation(m))
		shuffled_X = X[:, permutation]
		shuffled_Y = Y[:, permutation].reshape((1, m))
		for i in range(0, num_iterations):
			for j in range(0, m):  # 每次训练一个样本
				# Forward propagation
				AL,caches = forward_propagation(shuffled_X[:, j].reshape(-1,1), parameters)
				# Compute cost
				cost = compute_cost(AL, shuffled_Y[:, j].reshape(1,1))
				# Backward propagation
				grads = backward_propagation(AL, shuffled_Y[:,j].reshape(1,1), caches)
				# Update parameters.
				parameters = update_parameters(parameters, grads, learning_rate)
				if j % 20 == 0:
					print("example size {}: {}".format(j, cost))
					costs.append(cost)
	elif gradient_descent == 'mini-batch':
		seed = 0
		for i in range(0, num_iterations):
			# Define the random minibatches. We increment the seed to reshuffle differently the dataset after each epoch
			seed = seed + 1
			minibatches = random_mini_batches(X, Y, mini_batch_size, seed)
			for minibatch in minibatches:
				# Select a minibatch
				(minibatch_X, minibatch_Y) = minibatch
				# Forward propagation
				AL, caches = forward_propagation(minibatch_X, parameters)
				# Compute cost
				cost = compute_cost(AL, minibatch_Y)
				# Backward propagation
				grads = backward_propagation(AL, minibatch_Y, caches)
				parameters = update_parameters(parameters, grads, learning_rate)
			if i % 100 == 0:
				print("Cost after iteration {}: {}".format(i, cost))
				costs.append(cost)
	print('length of cost')
	print(len(costs))
	plt.clf()
	plt.plot(costs)
	plt.xlabel("iterations(hundred)")  # 横坐标名字
	plt.ylabel("cost")  # 纵坐标名字
	plt.show()
	return parameters

#predict function
def predict(X_test,y_test,parameters):
	"""
	:param X:
	:param y:
	:param parameters:
	:return:
	"""
	m = y_test.shape[1]
	Y_prediction = np.zeros((1, m))
	prob, caches = forward_propagation(X_test,parameters)
	for i in range(prob.shape[1]):
		# Convert probabilities A[0,i] to actual predictions p[0,i]
		if prob[0, i] > 0.5:
			Y_prediction[0, i] = 1
		else:
			Y_prediction[0, i] = 0
	accuracy = 1- np.mean(np.abs(Y_prediction - y_test))
	return accuracy
#DNN model
def DNN(X_train, y_train, X_test, y_test, layer_dims, learning_rate= 0.0006, num_iterations=30000, gradient_descent = 'bgd',mini_batch_size = 64):
	parameters = L_layer_model(X_train, y_train, layer_dims, learning_rate, num_iterations,gradient_descent,mini_batch_size)
	accuracy = predict(X_test,y_test,parameters)
	return accuracy

if __name__ == "__main__":
	X_data, y_data = load_breast_cancer(return_X_y=True)
	X_train, X_test,y_train,y_test = train_test_split(X_data, y_data, train_size=0.8,random_state=28)
	X_train = X_train.T
	y_train = y_train.reshape(y_train.shape[0], -1).T
	X_test = X_test.T
	y_test = y_test.reshape(y_test.shape[0], -1).T
	#use bgd
	accuracy = DNN(X_train,y_train,X_test,y_test,[X_train.shape[0],10,5,1])
	print(accuracy)
	#use sgd
	accuracy = DNN(X_train, y_train, X_test, y_test, [X_train.shape[0], 10, 5, 1],num_iterations=5, gradient_descent = 'sgd')
	print(accuracy)
	#mini-batch
	accuracy = DNN(X_train, y_train, X_test, y_test, [X_train.shape[0], 10, 5, 1], num_iterations=10000,gradient_descent='mini-batch')
	print(accuracy)

================================================
FILE: deep_neural_network_with_optimizers.py
================================================
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import  load_breast_cancer
from sklearn.model_selection import train_test_split
#initialize parameters(w,b)
def initialize_parameters(layer_dims):
	"""
	:param layer_dims: list,每一层单元的个数（维度）
	:return:dictionary,存储参数w1,w2,...,wL,b1,...,bL
	"""
	np.random.seed(3)
	L = len(layer_dims)#the number of layers in the network
	parameters = {}
	for l in range(1,L):
		# parameters["W" + str(l)] = np.random.randn(layer_dims[l],layer_dims[l-1])*0.01
		parameters["W" + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1])*np.sqrt(2/layer_dims[l-1]) # he initialization
		# parameters["W" + str(l)] = np.zeros((layer_dims[l], layer_dims[l - 1])) #为了测试初始化为0的后果
		# parameters["W" + str(l)] = np.random.randn(layer_dims[l], layer_dims[l - 1]) * np.sqrt(1 / layer_dims[l - 1])  # xavier initialization
		parameters["b" + str(l)] = np.zeros((layer_dims[l],1))
	return parameters
def relu(Z):
	"""
	:param Z: Output of the linear layer
	:return:
	A: output of activation
	"""
	A = np.maximum(0,Z)
	return A
#implement the activation function(ReLU and sigmoid)
def sigmoid(Z):
	"""
	:param Z: Output of the linear layer
	:return:
	"""
	A = 1 / (1 + np.exp(-Z))
	return A

def forward_propagation(X, parameters):
	"""
	X -- input dataset, of shape (input size, number of examples)
    parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2",...,"WL", "bL"
                    W -- weight matrix of shape (size of current layer, size of previous layer)
                    b -- bias vector of shape (size of current layer,1)
    :return:
	AL: the output of the last Layer(y_predict)
	caches: list, every element is a tuple:(W,b,z,A_pre)
	"""
	L = len(parameters) // 2  # number of layer
	A = X
	caches = [(None,None,None,X)]  # 第0层(None,None,None,A0) w,b,z用none填充,下标与层数一致，用于存储每一层的，w,b,z,A
	# calculate from 1 to L-1 layer
	for l in range(1,L):
		A_pre = A
		W = parameters["W" + str(l)]
		b = parameters["b" + str(l)]
		z = np.dot(W,A_pre) + b #计算z = wx + b
		A = relu(z) #relu activation function
		caches.append((W,b,z,A))
	# calculate Lth layer
	WL = parameters["W" + str(L)]
	bL = parameters["b" + str(L)]
	zL = np.dot(WL,A) + bL
	AL = sigmoid(zL)
	caches.append((WL,bL,zL,AL))
	return AL, caches
#calculate cost function
def compute_cost(AL,Y):
	"""
	:param AL: 最后一层的激活值，即预测值，shape:(1,number of examples)
	:param Y:真实值,shape:(1, number of examples)
	:return:
	"""
	m = Y.shape[1]
	# cost = -1.0/m * np.sum(Y*np.log(AL)+(1-Y)*np.log(1.0 - AL))#py中*是点乘
	# cost = (1. / m) * (-np.dot(Y, np.log(AL).T) - np.dot(1 - Y, np.log(1 - AL).T)) #推荐用这个，上面那个容易出错
	cost = 1. / m * np.nansum(np.multiply(-np.log(AL), Y) +
	                          np.multiply(-np.log(1 - AL), 1 - Y))
	#从数组的形状中删除单维条目，即把shape中为1的维度去掉，比如把[[[2]]]变成2
	cost = np.squeeze(cost)
	# print('=====================cost===================')
	# print(cost)
	return cost

# derivation of relu
def relu_backward(Z):
	"""
	:param Z: the input of activation
	:return:
	"""
	dA = np.int64(Z > 0)
	return dA

def backward_propagation(AL, Y, caches):
	"""
	Implement the backward propagation presented in figure 2.
	Arguments:
	X -- input dataset, of shape (input size, number of examples)
	Y -- true "label" vector (containing 0 if cat, 1 if non-cat)
	caches -- caches output from forward_propagation(),(W,b,z,pre_A)

	Returns:
	gradients -- A dictionary with the gradients with respect to dW,db
	"""
	m = Y.shape[1]
	L = len(caches) - 1
	# print("L:   " + str(L))
	#calculate the Lth layer gradients
	prev_AL = caches[L-1][3]
	dzL = 1./m * (AL - Y)
	# print(dzL.shape)
	# print(prev_AL.T.shape)
	dWL = np.dot(dzL, prev_AL.T)
	dbL = np.sum(dzL, axis=1, keepdims=True)
	gradients = {"dW"+str(L):dWL, "db"+str(L):dbL}
	#calculate from L-1 to 1 layer gradients
	for l in reversed(range(1,L)): # L-1,L-3,....,1
		post_W= caches[l+1][0] #要用后一层的W
		dz = dzL #用后一层的dz

		dal = np.dot(post_W.T, dz)
		z = caches[l][2]#当前层的z
		dzl = np.multiply(dal, relu_backward(z))
		prev_A = caches[l-1][3]#前一层的A
		dWl = np.dot(dzl, prev_A.T)
		dbl = np.sum(dzl, axis=1, keepdims=True)

		gradients["dW" + str(l)] = dWl
		gradients["db" + str(l)] = dbl
		dzL = dzl #更新dz
	return gradients

def update_parameters_with_gd(parameters, grads, learning_rate):
	"""
	:param parameters: dictionary,  W,b
	:param grads: dW,db
	:param learning_rate: alpha
	:return:
	"""
	L = len(parameters) // 2
	for l in range(L):
		parameters["W" + str(l + 1)] = parameters["W" + str(l + 1)] - learning_rate * grads["dW" + str(l+1)]
		parameters["b" + str(l + 1)] = parameters["b" + str(l + 1)] - learning_rate * grads["db" + str(l+1)]
	return parameters


def random_mini_batches(X, Y, mini_batch_size = 64, seed=1):
	"""
	Creates a list of random minibatches from (X, Y)
	Arguments:
	X -- input data, of shape (input size, number of examples)
	Y -- true "label" vector (1 for blue dot / 0 for red dot), of shape (1, number of examples)
	mini_batch_size -- size of the mini-batches, integer

	Returns:
	mini_batches -- list of synchronous (mini_batch_X, mini_batch_Y)
	"""
	np.random.seed(seed)
	m = X.shape[1]  # number of training examples
	mini_batches = []

	# Step 1: Shuffle (X, Y)
	permutation = list(np.random.permutation(m))
	shuffled_X = X[:, permutation]
	shuffled_Y = Y[:, permutation].reshape((1, m))

	# Step 2: Partition (shuffled_X, shuffled_Y). Minus the end case.
	num_complete_minibatches = m // mini_batch_size  # number of mini batches of size mini_batch_size in your partitionning
	for k in range(0, num_complete_minibatches):
		mini_batch_X = shuffled_X[:, k * mini_batch_size: (k + 1) * mini_batch_size]
		mini_batch_Y = shuffled_Y[:, k * mini_batch_size: (k + 1) * mini_batch_size]
		mini_batch = (mini_batch_X, mini_batch_Y)
		mini_batches.append(mini_batch)

	# Handling the end case (last mini-batch < mini_batch_size)
	if m % mini_batch_size != 0:
		mini_batch_X = shuffled_X[:, num_complete_minibatches * mini_batch_size: m]
		mini_batch_Y = shuffled_Y[:, num_complete_minibatches * mini_batch_size: m]
		mini_batch = (mini_batch_X, mini_batch_Y)
		mini_batches.append(mini_batch)

	return mini_batches


def initialize_velocity(parameters):
	"""
	Initializes the velocity as a python dictionary with:
				- keys: "dW1", "db1", ..., "dWL", "dbL"
				- values: numpy arrays of zeros of the same shape as the corresponding gradients/parameters.
	Arguments:
	parameters -- python dictionary containing your parameters.
					parameters['W' + str(l)] = Wl
					parameters['b' + str(l)] = bl

	Returns:
	v -- python dictionary containing the current velocity.
					v['dW' + str(l)] = velocity of dWl
					v['db' + str(l)] = velocity of dbl
	"""
	L = len(parameters) // 2  # number of layers in the neural networks
	v = {}
	# Initialize velocity
	for l in range(L):
		v["dW" + str(l + 1)] = np.zeros(parameters["W" + str(l + 1)].shape)
		v["db" + str(l + 1)] = np.zeros(parameters["b" + str(l + 1)].shape)
	return v

#momentum
def update_parameters_with_momentum(parameters, grads, v, beta, learning_rate):
	"""
	Update parameters using Momentum
	Arguments:
	parameters -- python dictionary containing your parameters:
					parameters['W' + str(l)] = Wl
					parameters['b' + str(l)] = bl
	grads -- python dictionary containing your gradients for each parameters:
					grads['dW' + str(l)] = dWl
					grads['db' + str(l)] = dbl
	v -- python dictionary containing the current velocity:
					v['dW' + str(l)] = ...
					v['db' + str(l)] = ...
	beta -- the momentum hyperparameter, scalar
	learning_rate -- the learning rate, scalar

	Returns:
	parameters -- python dictionary containing your updated parameters

	'''
	VdW = beta * VdW + (1-beta) * dW
	Vdb = beta * Vdb + (1-beta) * db
	W = W - learning_rate * VdW
	b = b - learning_rate * Vdb
	'''
	"""


	L = len(parameters) // 2  # number of layers in the neural networks

	# Momentum update for each parameter
	for l in range(L):
		# compute velocities
		v["dW" + str(l + 1)] = beta * v["dW" + str(l + 1)] + (1 - beta) * grads['dW' + str(l + 1)]
		v["db" + str(l + 1)] = beta * v["db" + str(l + 1)] + (1 - beta) * grads['db' + str(l + 1)]
		# update parameters
		parameters["W" + str(l + 1)] = parameters["W" + str(l + 1)] - learning_rate * v["dW" + str(l + 1)]
		parameters["b" + str(l + 1)] = parameters["b" + str(l + 1)] - learning_rate * v["db" + str(l + 1)]

	return parameters

#nesterov momentum
def update_parameters_with_nesterov_momentum(parameters, grads, v, beta, learning_rate):
	"""
	Update parameters using Momentum
	Arguments:
	parameters -- python dictionary containing your parameters:
					parameters['W' + str(l)] = Wl
					parameters['b' + str(l)] = bl
	grads -- python dictionary containing your gradients for each parameters:
					grads['dW' + str(l)] = dWl
					grads['db' + str(l)] = dbl
	v -- python dictionary containing the current velocity:
					v['dW' + str(l)] = ...
					v['db' + str(l)] = ...
	beta -- the momentum hyperparameter, scalar
	learning_rate -- the learning rate, scalar

	Returns:
	parameters -- python dictionary containing your updated parameters
	v -- python dictionary containing your updated velocities

	'''
	VdW = beta * VdW - learning_rate * dW
	Vdb = beta * Vdb - learning_rate * db
	W = W + beta * VdW - learning_rate * dW
	b = b + beta * Vdb - learning_rate * db
	'''
	"""

	L = len(parameters) // 2  # number of layers in the neural networks

	# Momentum update for each parameter
	for l in range(L):
		# compute velocities
		v["dW" + str(l + 1)] = beta * v["dW" + str(l + 1)] - learning_rate * grads['dW' + str(l + 1)]
		v["db" + str(l + 1)] = beta * v["db" + str(l + 1)] - learning_rate * grads['db' + str(l + 1)]
		# update parameters
		parameters["W" + str(l + 1)] += beta * v["dW" + str(l + 1)]- learning_rate * grads['dW' + str(l + 1)]
		parameters["b" + str(l + 1)] += beta * v["db" + str(l + 1)] - learning_rate * grads["db" + str(l + 1)]

	return parameters


#AdaGrad initialization
def initialize_adagrad(parameters):
	"""
	Initializes the velocity as a python dictionary with:
				- keys: "dW1", "db1", ..., "dWL", "dbL"
				- values: numpy arrays of zeros of the same shape as the corresponding gradients/parameters.
	Arguments:
	parameters -- python dictionary containing your parameters.
					parameters['W' + str(l)] = Wl
					parameters['b' + str(l)] = bl

	Returns:
	Gt -- python dictionary containing sum of the squares of the gradients up to step t.
					G['dW' + str(l)] = sum of the squares of the gradients up to dwl
					G['db' + str(l)] = sum of the squares of the gradients up to db1
	"""
	L = len(parameters) // 2  # number of layers in the neural networks
	G = {}
	# Initialize velocity
	for l in range(L):
		G["dW" + str(l + 1)] = np.zeros(parameters["W" + str(l + 1)].shape)
		G["db" + str(l + 1)] = np.zeros(parameters["b" + str(l + 1)].shape)
	return G

#AdaGrad
def update_parameters_with_adagrad(parameters, grads, G, learning_rate, epsilon = 1e-7):
	"""
	Update parameters using Momentum
	Arguments:
	parameters -- python dictionary containing your parameters:
					parameters['W' + str(l)] = Wl
					parameters['b' + str(l)] = bl
	grads -- python dictionary containing your gradients for each parameters:
					grads['dW' + str(l)] = dWl
					grads['db' + str(l)] = dbl
	G -- python dictionary containing the current velocity:
					G['dW' + str(l)] = ...
					G['db' + str(l)] = ...
	learning_rate -- the learning rate, scalar
	epsilon -- hyperparameter preventing division by zero in adagrad updates

	Returns:
	parameters -- python dictionary containing your updated parameters

	'''
	GW += (dW)^2
	W -= learning_rate/sqrt(GW + epsilon)*dW
	Gb += (db)^2
	b -= learning_rate/sqrt(Gb + epsilon)*db
	'''
	"""

	L = len(parameters) // 2  # number of layers in the neural networks

	# Momentum update for each parameter
	for l in range(L):
		# compute velocities
		G["dW" + str(l + 1)] += grads['dW' + str(l + 1)]**2
		G["db" + str(l + 1)] += grads['db' + str(l + 1)]**2
		# update parameters
		parameters["W" + str(l + 1)] -= learning_rate / (np.sqrt(G["dW" + str(l + 1)]) + epsilon) * grads['dW' + str(l + 1)]
		parameters["b" + str(l + 1)] -= learning_rate / (np.sqrt(G["db" + str(l + 1)]) + epsilon) * grads['db' + str(l + 1)]

	return parameters


#initialize_adadelta
def initialize_adadelta(parameters):
	"""
	Initializes s and delta as two python dictionaries with:
				- keys: "dW1", "db1", ..., "dWL", "dbL"
				- values: numpy arrays of zeros of the same shape as the corresponding gradients/parameters.

	Arguments:
	parameters -- python dictionary containing your parameters.
					parameters["W" + str(l)] = Wl
					parameters["b" + str(l)] = bl

	Returns:
	s -- python dictionary that will contain the exponentially weighted average of the squared gradient of dw
					s["dW" + str(l)] = ...
					s["db" + str(l)] = ...
	v -- python dictionary that will contain the RMS
				v["dW" + str(l)] = ...
				v["db" + str(l)] = ...
	delta -- python dictionary that will contain the exponentially weighted average of the squared gradient of delta_w
					delta["dW" + str(l)] = ...
					delta["db" + str(l)] = ...

	"""

	L = len(parameters) // 2  # number of layers in the neural networks
	s = {}
	v = {}
	delta = {}
	# Initialize s, v, delta. Input: "parameters". Outputs: "s, v, delta".
	for l in range(L):
		s["dW" + str(l + 1)] = np.zeros(parameters["W" + str(l + 1)].shape)
		s["db" + str(l + 1)] = np.zeros(parameters["b" + str(l + 1)].shape)
		v["dW" + str(l + 1)] = np.zeros(parameters["W" + str(l + 1)].shape)
		v["db" + str(l + 1)] = np.zeros(parameters["b" + str(l + 1)].shape)
		delta["dW" + str(l + 1)] = np.zeros(parameters["W" + str(l + 1)].shape)
		delta["db" + str(l + 1)] = np.zeros(parameters["b" + str(l + 1)].shape)

	return s, v, delta

#adadelta
def update_parameters_with_adadelta(parameters, grads, rho, s, v, delta, epsilon = 1e-6):
	"""
	Update parameters using Momentum
	Arguments:
	parameters -- python dictionary containing your parameters:
					parameters['W' + str(l)] = Wl
					parameters['b' + str(l)] = bl
	grads -- python dictionary containing your gradients for each parameters:
					grads['dW' + str(l)] = dWl
					grads['db' + str(l)] = dbl
	rho -- decay constant similar to that used in the momentum method, scalar
	s -- python dictionary containing the current velocity:
					s['dW' + str(l)] = ...
					s['db' + str(l)] = ...
	delta -- python dictionary containing the current RMS:
					delta['dW' + str(l)] = ...
					delta['db' + str(l)] = ...

	epsilon -- hyperparameter preventing division by zero in adagrad updates

	Returns:
	parameters -- python dictionary containing your updated parameters

	'''
	Sdw = rho*Sdw + (1 - rho)*(dW)^2
	Sdb = rho*Sdb + (1 - rho)*(db)^2
	Vdw = sqrt((delta_w + epsilon) / (Sdw + epsilon))*dW
	Vdb = sqrt((delta_b + epsilon) / (Sdb + epsilon))*dW
	W -= Vdw
	b -= Vdb
	delta_w = rho*delta_w + (1 - rho)*(Vdw)^2
	delta_b = rho*delta_b + (1 - rho)*(Vdb)^2
	'''
	"""

	L = len(parameters) // 2  # number of layers in the neural networks
	# adadelta update for each parameter
	for l in range(L):
		# compute s
		s["dW" + str(l + 1)] = rho * s["dW" + str(l + 1)] + (1 - rho)*grads['dW' + str(l + 1)]**2
		s["db" + str(l + 1)] = rho * s["db" + str(l + 1)] + (1 - rho)*grads['db' + str(l + 1)]**2
		#compute RMS
		v["dW" + str(l + 1)] = np.sqrt((delta["dW" + str(l + 1)] + epsilon) / (s["dW" + str(l + 1)] + epsilon)) * grads['dW' + str(l + 1)]
		v["db" + str(l + 1)] = np.sqrt((delta["db" + str(l + 1)] + epsilon) / (s["db" + str(l + 1)] + epsilon)) * grads['db' + str(l + 1)]
		# update parameters
		parameters["W" + str(l + 1)] -= v["dW" + str(l + 1)]
		parameters["b" + str(l + 1)] -= v["db" + str(l + 1)]
		#compute delta
		delta["dW" + str(l + 1)] = rho * delta["dW" + str(l + 1)] + (1 - rho) * v["dW" + str(l + 1)] ** 2
		delta["db" + str(l + 1)] = rho * delta["db" + str(l + 1)] + (1 - rho) * v["db" + str(l + 1)] ** 2

	return parameters

#RMSprop
def update_parameters_with_rmsprop(parameters, grads, s, beta = 0.9, learning_rate = 0.01, epsilon = 1e-6):
	"""
	Update parameters using Momentum
	Arguments:
	parameters -- python dictionary containing your parameters:
					parameters['W' + str(l)] = Wl
					parameters['b' + str(l)] = bl
	grads -- python dictionary containing your gradients for each parameters:
					grads['dW' + str(l)] = dWl
					grads['db' + str(l)] = dbl
	s -- python dictionary containing the current velocity:
					v['dW' + str(l)] = ...
					v['db' + str(l)] = ...
	beta -- the momentum hyperparameter, scalar
	learning_rate -- the learning rate, scalar

	Returns:
	parameters -- python dictionary containing your updated parameters
	'''
	SdW = beta * SdW + (1-beta) * (dW)^2
	sdb = beta * Sdb + (1-beta) * (db)^2
	W = W - learning_rate * dW/sqrt(SdW + epsilon)
	b = b - learning_rate * db/sqrt(Sdb + epsilon)
	'''
	"""
	L = len(parameters) // 2  # number of layers in the neural networks
	# rmsprop update for each parameter
	for l in range(L):
		# compute velocities
		s["dW" + str(l + 1)] = beta * s["dW" + str(l + 1)] + (1 - beta) * grads['dW' + str(l + 1)]**2
		s["db" + str(l + 1)] = beta * s["db" + str(l + 1)] + (1 - beta) * grads['db' + str(l + 1)]**2
		# update parameters
		parameters["W" + str(l + 1)] = parameters["W" + str(l + 1)] - learning_rate * grads['dW' + str(l + 1)] / np.sqrt(s["dW" + str(l + 1)] + epsilon)
		parameters["b" + str(l + 1)] = parameters["b" + str(l + 1)] - learning_rate * grads['db' + str(l + 1)] / np.sqrt(s["db" + str(l + 1)] + epsilon)

	return parameters

#initialize adam
def initialize_adam(parameters):
	"""
	Initializes v and s as two python dictionaries with:
				- keys: "dW1", "db1", ..., "dWL", "dbL"
				- values: numpy arrays of zeros of the same shape as the corresponding gradients/parameters.
	Arguments:
	parameters -- python dictionary containing your parameters.
					parameters["W" + str(l)] = Wl
					parameters["b" + str(l)] = bl
	Returns:
	v -- python dictionary that will contain the exponentially weighted average of the gradient.
					v["dW" + str(l)] = ...
					v["db" + str(l)] = ...
	s -- python dictionary that will contain the exponentially weighted average of the squared gradient.
					s["dW" + str(l)] = ...
					s["db" + str(l)] = ...

	"""
	L = len(parameters) // 2  # number of layers in the neural networks
	v = {}
	s = {}
	# Initialize v, s. Input: "parameters". Outputs: "v, s".
	for l in range(L):
		v["dW" + str(l + 1)] = np.zeros(parameters["W" + str(l + 1)].shape)
		v["db" + str(l + 1)] = np.zeros(parameters["b" + str(l + 1)].shape)
		s["dW" + str(l + 1)] = np.zeros(parameters["W" + str(l + 1)].shape)
		s["db" + str(l + 1)] = np.zeros(parameters["b" + str(l + 1)].shape)

	return v, s

#adam
def update_parameters_with_adam(parameters, grads, v, s, t, learning_rate=0.01, beta1=0.9, beta2=0.999, epsilon=1e-8):
	"""
	Update parameters using Adam

	Arguments:
	parameters -- python dictionary containing your parameters:
					parameters['W' + str(l)] = Wl
					parameters['b' + str(l)] = bl
	grads -- python dictionary containing your gradients for each parameters:
					grads['dW' + str(l)] = dWl
					grads['db' + str(l)] = dbl
	v -- Adam variable, moving average of the first gradient, python dictionary
	s -- Adam variable, moving average of the squared gradient, python dictionary
	learning_rate -- the learning rate, scalar.
	beta1 -- Exponential decay hyperparameter for the first moment estimates
	beta2 -- Exponential decay hyperparameter for the second moment estimates
	epsilon -- hyperparameter preventing division by zero in Adam updates

	Returns:
	parameters -- python dictionary containing your updated parameters
	"""

	L = len(parameters) // 2  # number of layers in the neural networks
	v_corrected = {}  # Initializing first moment estimate, python dictionary
	s_corrected = {}  # Initializing second moment estimate, python dictionary

	# Perform Adam update on all parameters
	for l in range(L):
		# Moving average of the gradients. Inputs: "v, grads, beta1". Output: "v".
		v["dW" + str(l + 1)] = beta1 * v["dW" + str(l + 1)] + (1 - beta1) * grads['dW' + str(l + 1)]
		v["db" + str(l + 1)] = beta1 * v["db" + str(l + 1)] + (1 - beta1) * grads['db' + str(l + 1)]
		# Compute bias-corrected first moment estimate. Inputs: "v, beta1, t". Output: "v_corrected".
		v_corrected["dW" + str(l + 1)] = v["dW" + str(l + 1)] / (1 - np.power(beta1, t))
		v_corrected["db" + str(l + 1)] = v["db" + str(l + 1)] / (1 - np.power(beta1, t))
		# Moving average of the squared gradients. Inputs: "s, grads, beta2". Output: "s".
		s["dW" + str(l + 1)] = beta2 * s["dW" + str(l + 1)] + (1 - beta2) * np.power(grads['dW' + str(l + 1)], 2)
		s["db" + str(l + 1)] = beta2 * s["db" + str(l + 1)] + (1 - beta2) * np.power(grads['db' + str(l + 1)], 2)
		# Compute bias-corrected second raw moment estimate. Inputs: "s, beta2, t". Output: "s_corrected".
		s_corrected["dW" + str(l + 1)] = s["dW" + str(l + 1)] / (1 - np.power(beta2, t))
		s_corrected["db" + str(l + 1)] = s["db" + str(l + 1)] / (1 - np.power(beta2, t))
		# Update parameters. Inputs: "parameters, learning_rate, v_corrected, s_corrected, epsilon". Output: "parameters".
		parameters["W" + str(l + 1)] = parameters["W" + str(l + 1)] - learning_rate * v_corrected["dW" + str(l + 1)] / np.sqrt(s_corrected["dW" + str(l + 1)] + epsilon)
		parameters["b" + str(l + 1)] = parameters["b" + str(l + 1)] - learning_rate * v_corrected["db" + str(l + 1)] / np.sqrt(s_corrected["db" + str(l + 1)] + epsilon)

	return parameters


def L_layer_model(X, Y, layer_dims, learning_rate, num_iterations, optimizer, beta = 0.9, beta2 = 0.999, mini_batch_size = 64, epsilon = 1e-8):
	"""
	:param X:
	:param Y:
	:param layer_dims:list containing the input size and each layer size
	:param learning_rate:
	:param num_iterations:
	:return:
	parameters：final parameters:(W,b)
	"""
	costs = []
	# initialize parameters
	parameters = initialize_parameters(layer_dims)
	if optimizer == "sgd":
		pass  # no initialization required for gradient descent
	elif optimizer == "momentum" or optimizer == "nesterov_momentum" or optimizer == "rmsprop":
		v = initialize_velocity(parameters)
	elif optimizer == "adagrad":
		G = initialize_adagrad(parameters)
	elif optimizer == "adadelta":
		s, v, delta = initialize_adadelta(parameters)
	elif optimizer == "adam":
		v, s = initialize_adam(parameters)
	t = 0 # initializing the counter required for Adam update
	seed = 0
	for i in range(0, num_iterations):
		# Define the random minibatches. We increment the seed to reshuffle differently the dataset after each epoch
		seed = seed + 1
		minibatches = random_mini_batches(X, Y, mini_batch_size, seed)
		for minibatch in minibatches:
			# Select a minibatch
			(minibatch_X, minibatch_Y) = minibatch
			# Forward propagation
			AL, caches = forward_propagation(minibatch_X, parameters)
			# Compute cost
			cost = compute_cost(AL, minibatch_Y)
			# Backward propagation
			grads = backward_propagation(AL, minibatch_Y, caches)
			if optimizer == "sgd":
				parameters = update_parameters_with_gd(parameters, grads, learning_rate)
			elif optimizer == "momentum":
				parameters = update_parameters_with_momentum(parameters, grads, v, beta, learning_rate)
			elif optimizer == "nesterov_momentum":
				parameters = update_parameters_with_nesterov_momentum(parameters, grads, v, beta, learning_rate)
			elif optimizer == "adagrad":
				parameters = update_parameters_with_adagrad(parameters,grads,G,learning_rate,epsilon)
			elif optimizer == "adadelta":
				parameters = update_parameters_with_adadelta(parameters,grads,beta,s,v,delta,epsilon)
			elif optimizer == "rmsprop":
				parameters = update_parameters_with_rmsprop(parameters, grads, v, beta, learning_rate, epsilon)
			elif optimizer == "adam":
				t += 1
				parameters = update_parameters_with_adam(parameters, grads, v, s, t, learning_rate, beta, beta2, epsilon)

		if i % 100 == 0:
			print("Cost after iteration {}: {}".format(i, cost))
			costs.append(cost)
	print('length of cost')
	print(len(costs))
	plt.clf()
	plt.plot(costs, label = optimizer)
	plt.xlabel("iterations(hundreds)")  # 横坐标名字
	plt.ylabel("cost")  # 纵坐标名字
	plt.legend(loc="best")
	plt.show()
	return parameters

#predict function
def predict(X_test,y_test,parameters):
	"""
	:param X:
	:param y:
	:param parameters:
	:return:
	"""
	m = y_test.shape[1]
	Y_prediction = np.zeros((1, m))
	prob, caches = forward_propagation(X_test,parameters)
	for i in range(prob.shape[1]):
		# Convert probabilities A[0,i] to actual predictions p[0,i]
		if prob[0, i] > 0.5:
			Y_prediction[0, i] = 1
		else:
			Y_prediction[0, i] = 0
	accuracy = 1- np.mean(np.abs(Y_prediction - y_test))
	return accuracy
#DNN model
def DNN(X_train, y_train, X_test, y_test, layer_dims, learning_rate= 0.0005, num_iterations=10000,optimizer = 'sgd', beta = 0.9, beta2 = 0.999, mini_batch_size = 64,epsilon = 1e-8):
	parameters = L_layer_model(X_train, y_train, layer_dims, learning_rate, num_iterations, optimizer, beta, beta2, mini_batch_size, epsilon)
	accuracy = predict(X_test,y_test,parameters)
	return accuracy

if __name__ == "__main__":
	X_data, y_data = load_breast_cancer(return_X_y=True)
	X_train, X_test,y_train,y_test = train_test_split(X_data, y_data, train_size=0.8,random_state=28)
	X_train = X_train.T
	y_train = y_train.reshape(y_train.shape[0], -1).T
	X_test = X_test.T
	y_test = y_test.reshape(y_test.shape[0], -1).T
	# #mini-batch
	# accuracy = DNN(X_train, y_train, X_test, y_test, [X_train.shape[0], 10, 5, 1], num_iterations=10000)
	# print(accuracy)
	# # momentum
	# accuracy = DNN(X_train, y_train, X_test, y_test, [X_train.shape[0], 10, 5, 1], num_iterations=10000, optimizer='momentum')
	# print(accuracy)
	# nesterov momentum
	# accuracy = DNN(X_train, y_train, X_test, y_test, [X_train.shape[0], 10, 5, 1], learning_rate= 0.0001,num_iterations=10000,optimizer='nesterov_momentum')
	# print(accuracy)
	#adagrad
	# accuracy = DNN(X_train, y_train, X_test, y_test, [X_train.shape[0], 10, 5, 1], learning_rate= 0.01,num_iterations=10000,optimizer='adagrad')
	# print(accuracy)
	#adadelta
	# accuracy = DNN(X_train, y_train, X_test, y_test, [X_train.shape[0], 10, 5, 1],num_iterations=10000, beta= 0.9, epsilon=1e-6, optimizer='adadelta')
	# print(accuracy)
	# #RMSprop
	# accuracy = DNN(X_train, y_train, X_test, y_test, [X_train.shape[0], 10, 5, 1], learning_rate=0.001, num_iterations=10000, beta=0.9,epsilon=1e-6, optimizer='rmsprop')
	# print(accuracy)
	#adam
	accuracy = DNN(X_train, y_train, X_test, y_test, [X_train.shape[0], 10, 5, 1], learning_rate=0.001, num_iterations=10000, beta=0.9, beta2=0.999, epsilon=1e-8, optimizer='adam')
	print(accuracy)

================================================
FILE: dinos.txt
================================================
Aachenosaurus
Aardonyx
Abdallahsaurus
Abelisaurus
Abrictosaurus
Abrosaurus
Abydosaurus
Acanthopholis
Achelousaurus
Acheroraptor
Achillesaurus
Achillobator
Acristavus
Acrocanthosaurus
Acrotholus
Actiosaurus
Adamantisaurus
Adasaurus
Adelolophus
Adeopapposaurus
Aegyptosaurus
Aeolosaurus
Aepisaurus
Aepyornithomimus
Aerosteon
AetonyxAfromimus
Afrovenator
Agathaumas
Aggiosaurus
Agilisaurus
Agnosphitys
Agrosaurus
Agujaceratops
Agustinia
Ahshislepelta
Airakoraptor
Ajancingenia
Ajkaceratops
Alamosaurus
Alaskacephale
Albalophosaurus
Albertaceratops
Albertadromeus
Albertavenator
Albertonykus
Albertosaurus
Albinykus
Albisaurus
Alcovasaurus
Alectrosaurus
Aletopelta
Algoasaurus
Alioramus
Aliwalia
Allosaurus
Almas
Alnashetri
Alocodon
Altirhinus
Altispinax
Alvarezsaurus
Alwalkeria
Alxasaurus
Amargasaurus
Amargastegos
Amargatitanis
Amazonsaurus
Ammosaurus
Ampelosaurus
Amphicoelias
Amphicoelicaudia
Amphisaurus
Amtocephale
Amtosaurus
Amurosaurus
Amygdalodon
Anabisetia
Anasazisaurus
Anatosaurus
Anatotitan
Anchiceratops
Anchiornis
Anchisaurus
Andesaurus
Andhrasaurus
Angaturama
Angloposeidon
Angolatitan
Angulomastacator
Aniksosaurus
Animantarx
Ankistrodon
Ankylosaurus
Anodontosaurus
Anoplosaurus
Anserimimus
Antarctopelta
Antarctosaurus
Antetonitrus
Anthodon
Antrodemus
Anzu
Aoniraptor
Aorun
Apatodon
Apatoraptor
Apatosaurus
Appalachiosaurus
Aquilops
Aragosaurus
Aralosaurus
Araucanoraptor
Archaeoceratops
Archaeodontosaurus
Archaeopteryx
Archaeoraptor
Archaeornis
Archaeornithoides
Archaeornithomimus
Arcovenator
Arctosaurus
Arcusaurus
Arenysaurus
Argentinosaurus
Argyrosaurus
Aristosaurus
Aristosuchus
Arizonasaurus
Arkansaurus
Arkharavia
Arrhinoceratops
Arstanosaurus
Asiaceratops
Asiamericana
Asiatosaurus
Astrodon
Astrodonius
Astrodontaurus
Astrophocaudia
Asylosaurus
Atacamatitan
Atlantosaurus
Atlasaurus
Atlascopcosaurus
Atrociraptor
Atsinganosaurus
Aublysodon
Aucasaurus
Augustia
Augustynolophus
Auroraceratops
Aurornis
Australodocus
Australovenator
Austrocheirus
Austroposeidon
Austroraptor
Austrosaurus
Avaceratops
Avalonia
Avalonianus
Aviatyrannis
Avimimus
Avisaurus
Avipes
Azendohsaurus
Bactrosaurus
Bagaceratops
Bagaraatan
Bahariasaurus
Bainoceratops
Bakesaurus
Balaur
Balochisaurus
Bambiraptor
Banji
Baotianmansaurus
Barapasaurus
Barilium
Barosaurus
Barrosasaurus
Barsboldia
Baryonyx
Bashunosaurus
Basutodon
Bathygnathus
Batyrosaurus
Baurutitan
Bayosaurus
Becklespinax
Beelemodon
Beibeilong
Beipiaognathus
Beipiaosaurus
Beishanlong
Bellusaurus
Belodon
Berberosaurus
Betasuchus
Bicentenaria
Bienosaurus
Bihariosaurus
Bilbeyhallorum
Bissektipelta
Bistahieversor
Blancocerosaurus
Blasisaurus
Blikanasaurus
Bolong
Bonapartenykus
Bonapartesaurus
Bonatitan
Bonitasaura
Borealopelta
Borealosaurus
Boreonykus
Borogovia
Bothriospondylus
Brachiosaurus
Brachyceratops
Brachylophosaurus
Brachypodosaurus
Brachyrophus
Brachytaenius
Brachytrachelopan
Bradycneme
Brasileosaurus
Brasilotitan
Bravoceratops
Breviceratops
Brohisaurus
Brontomerus
Brontoraptor
Brontosaurus
Bruhathkayosaurus
Bugenasaura
Buitreraptor
Burianosaurus
Buriolestes
Byranjaffia
Byronosaurus
Caenagnathasia
Caenagnathus
Calamosaurus
Calamospondylus
Calamospondylus
Callovosaurus
Camarasaurus
Camarillasaurus
Camelotia
Camposaurus
Camptonotus
Camptosaurus
Campylodon
Campylodoniscus
Canardia
Capitalsaurus
Carcharodontosaurus
Cardiodon
Carnotaurus
Caseosaurus
Cathartesaura
Cathetosaurus
Caudipteryx
Caudocoelus
Caulodon
Cedarosaurus
Cedarpelta
Cedrorestes
Centemodon
Centrosaurus
Cerasinops
Ceratonykus
Ceratops
Ceratosaurus
Cetiosauriscus
Cetiosaurus
Changchunsaurus
Changdusaurus
Changyuraptor
Chaoyangsaurus
Charonosaurus
Chasmosaurus
Chassternbergia
Chebsaurus
Chenanisaurus
Cheneosaurus
Chialingosaurus
Chiayusaurus
Chienkosaurus
Chihuahuasaurus
Chilantaisaurus
Chilesaurus
Chindesaurus
Chingkankousaurus
Chinshakiangosaurus
Chirostenotes
Choconsaurus
Chondrosteosaurus
Chromogisaurus
Chuandongocoelurus
Chuanjiesaurus
Chuanqilong
Chubutisaurus
Chungkingosaurus
Chuxiongosaurus
Cinizasaurus
Cionodon
Citipati
Cladeiodon
Claorhynchus
Claosaurus
Clarencea
Clasmodosaurus
Clepsysaurus
Coahuilaceratops
Coelophysis
Coelosaurus
Coeluroides
Coelurosauravus
Coelurus
Colepiocephale
Coloradia
Coloradisaurus
Colossosaurus
Comahuesaurus
Comanchesaurus
Compsognathus
Compsosuchus
Concavenator
Conchoraptor
Condorraptor
Coronosaurus
Corythoraptor
Corythosaurus
Craspedodon
Crataeomus
Craterosaurus
Creosaurus
Crichtonpelta
Crichtonsaurus
Cristatusaurus
Crosbysaurus
Cruxicheiros
Cryolophosaurus
Cryptodraco
Cryptoraptor
Cryptosaurus
Cryptovolans
Cumnoria
Daanosaurus
Dacentrurus
Dachongosaurus
Daemonosaurus
Dahalokely
Dakosaurus
Dakotadon
Dakotaraptor
Daliansaurus
Damalasaurus
Dandakosaurus
Danubiosaurus
Daptosaurus
Darwinsaurus
Dashanpusaurus
Daspletosaurus
Dasygnathoides
Dasygnathus
Datanglong
Datonglong
Datousaurus
Daurosaurus
Daxiatitan
Deinocheirus
Deinodon
Deinonychus
Delapparentia
Deltadromeus
Demandasaurus
Denversaurus
Deuterosaurus
Diabloceratops
Diamantinasaurus
Dianchungosaurus
Diceratops
DiceratusDiclonius
Dicraeosaurus
DidanodonDilong
Dilophosaurus
Diluvicursor
Dimodosaurus
Dinheirosaurus
Dinodocus
Dinotyrannus
Diplodocus
Diplotomodon
Diracodon
Dolichosuchus
Dollodon
Domeykosaurus
Dongbeititan
Dongyangopelta
Dongyangosaurus
Doratodon
Doryphorosaurus
Draconyx
Dracopelta
Dracoraptor
Dracorex
Dracovenator
Dravidosaurus
Dreadnoughtus
Drinker
Dromaeosauroides
Dromaeosaurus
Dromiceiomimus
Dromicosaurus
Drusilasaura
Dryosaurus
Dryptosauroides
Dryptosaurus
Dubreuillosaurus
Duriatitan
Duriavenator
Dynamosaurus
Dyoplosaurus
Dysalotosaurus
Dysganus
Dyslocosaurus
Dystrophaeus
Dystylosaurus
Echinodon
Edmarka
Edmontonia
Edmontosaurus
Efraasia
Einiosaurus
Ekrixinatosaurus
Elachistosuchus
Elaltitan
Elaphrosaurus
Elmisaurus
Elopteryx
Elosaurus
Elrhazosaurus
Elvisaurus
Emausaurus
Embasaurus
Enigmosaurus
Eoabelisaurus
Eobrontosaurus
Eocarcharia
Eoceratops
Eocursor
Eodromaeus
Eohadrosaurus
Eolambia
Eomamenchisaurus
Eoplophysis
Eoraptor
Eosinopteryx
Eotrachodon
Eotriceratops
Eotyrannus
Eousdryosaurus
Epachthosaurus
Epanterias
Ephoenosaurus
Epicampodon
Epichirostenotes
Epidendrosaurus
Epidexipteryx
Equijubus
Erectopus
Erketu
Erliansaurus
Erlikosaurus
Eshanosaurus
Euacanthus
Eucamerotus
Eucentrosaurus
Eucercosaurus
Eucnemesaurus
Eucoelophysis
Eugongbusaurus
Euhelopus
Euoplocephalus
Eupodosaurus
Eureodon
Eurolimnornis
Euronychodon
Europasaurus
Europatitan
Europelta
Euskelosaurus
Eustreptospondylus
Fabrosaurus
Falcarius
Fendusaurus
Fenestrosaurus
Ferganasaurus
Ferganastegos
Ferganocephale
Foraminacephale
Fosterovenator
Frenguellisaurus
Fruitadens
Fukuiraptor
Fukuisaurus
Fukuititan
Fukuivenator
Fulengia
Fulgurotherium
Fusinasus
Fusuisaurus
Futabasaurus
Futalognkosaurus
Gadolosaurus
Galeamopus
Galesaurus
Gallimimus
Galtonia
Galveosaurus
Galvesaurus
Gannansaurus
Gansutitan
Ganzhousaurus
Gargoyleosaurus
Garudimimus
Gasosaurus
Gasparinisaura
Gastonia
Gavinosaurus
Geminiraptor
Genusaurus
Genyodectes
Geranosaurus
Gideonmantellia
Giganotosaurus
Gigantoraptor
Gigantosaurus
Gigantosaurus
Gigantoscelus
Gigantspinosaurus
Gilmoreosaurus
Ginnareemimus
Giraffatitan
Glacialisaurus
Glishades
Glyptodontopelta
Skeleton
Gobiceratops
Gobisaurus
Gobititan
Gobivenator
Godzillasaurus
Gojirasaurus
Gondwanatitan
Gongbusaurus
Gongpoquansaurus
Gongxianosaurus
Gorgosaurus
Goyocephale
Graciliceratops
Graciliraptor
Gracilisuchus
Gravitholus
Gresslyosaurus
Griphornis
Griphosaurus
Gryphoceratops
Gryponyx
Gryposaurus
Gspsaurus
Guaibasaurus
Gualicho
Guanlong
Gwyneddosaurus
Gyposaurus
Hadrosauravus
Hadrosaurus
Haestasaurus
Hagryphus
Hallopus
Halszkaraptor
Halticosaurus
Hanssuesia
Hanwulosaurus
Haplocanthosaurus
Haplocanthus
Haplocheirus
Harpymimus
Haya
Hecatasaurus
Heilongjiangosaurus
Heishansaurus
Helioceratops
Helopus
Heptasteornis
Herbstosaurus
Herrerasaurus
Hesperonychus
Hesperosaurus
Heterodontosaurus
Heterosaurus
Hexing
Hexinlusaurus
Heyuannia
Hierosaurus
Hippodraco
Hironosaurus
Hisanohamasaurus
Histriasaurus
Homalocephale
Honghesaurus
Hongshanosaurus
Hoplitosaurus
Hoplosaurus
Horshamosaurus
Hortalotarsus
Huabeisaurus
Hualianceratops
Huanansaurus
Huanghetitan
Huangshanlong
Huaxiagnathus
Huaxiaosaurus
Huaxiasaurus
Huayangosaurus
Hudiesaurus
Huehuecanauhtlus
Hulsanpes
Hungarosaurus
Huxleysaurus
Hylaeosaurus
HylosaurusHypacrosaurus
Hypselorhachis
Hypselosaurus
Hypselospinus
Hypsibema
Hypsilophodon
Hypsirhophus
habodcraniosaurus
Ichthyovenator
Ignavusaurus
Iguanacolossus
Iguanodon
Iguanoides
Skeleton
Iguanosaurus
Iliosuchus
Ilokelesia
Incisivosaurus
Indosaurus
Indosuchus
Ingenia
Inosaurus
Irritator
Isaberrysaura
Isanosaurus
Ischioceratops
Ischisaurus
Ischyrosaurus
Isisaurus
Issasaurus
Itemirus
Iuticosaurus
Jainosaurus
Jaklapallisaurus
Janenschia
Jaxartosaurus
Jeholosaurus
Jenghizkhan
Jensenosaurus
Jeyawati
Jianchangosaurus
Jiangjunmiaosaurus
Jiangjunosaurus
Jiangshanosaurus
Jiangxisaurus
Jianianhualong
Jinfengopteryx
Jingshanosaurus
Jintasaurus
Jinzhousaurus
Jiutaisaurus
Jobaria
Jubbulpuria
Judiceratops
Jurapteryx
Jurassosaurus
Juratyrant
Juravenator
Kagasaurus
Kaijiangosaurus
Kakuru
Kangnasaurus
Karongasaurus
Katepensaurus
Katsuyamasaurus
Kayentavenator
Kazaklambia
Kelmayisaurus
KemkemiaKentrosaurus
Kentrurosaurus
Kerberosaurus
Kentrosaurus
Khaan
Khetranisaurus
Kileskus
Kinnareemimus
Kitadanisaurus
Kittysaurus
KlamelisaurusKol
Koparion
Koreaceratops
Koreanosaurus
Koreanosaurus
Koshisaurus
Kosmoceratops
Kotasaurus
Koutalisaurus
Kritosaurus
Kryptops
Krzyzanowskisaurus
Kukufeldia
Kulceratops
Kulindadromeus
Kulindapteryx
Kunbarrasaurus
Kundurosaurus
Kunmingosaurus
Kuszholia
Labocania
Labrosaurus
Laelaps
Laevisuchus
Lagerpeton
Lagosuchus
Laiyangosaurus
Lamaceratops
Lambeosaurus
Lametasaurus
Lamplughsaura
Lanasaurus
Lancangosaurus
Lancanjiangosaurus
Lanzhousaurus
Laosaurus
Lapampasaurus
Laplatasaurus
Lapparentosaurus
Laquintasaura
Latenivenatrix
Latirhinus
Leaellynasaura
Leinkupal
Leipsanosaurus
Lengosaurus
Leonerasaurus
Lepidocheirosaurus
Lepidus
Leptoceratops
Leptorhynchos
Leptospondylus
Leshansaurus
Lesothosaurus
Lessemsaurus
Levnesovia
Lewisuchus
Lexovisaurus
Leyesaurus
Liaoceratops
Liaoningosaurus
Liaoningtitan
Liaoningvenator
Liassaurus
Libycosaurus
Ligabueino
Ligabuesaurus
Ligomasaurus
Likhoelesaurus
Liliensternus
Limaysaurus
Limnornis
Limnosaurus
Limusaurus
Linhenykus
Linheraptor
Linhevenator
Lirainosaurus
LisboasaurusLiubangosaurus
Lohuecotitan
Loncosaurus
Longisquama
Longosaurus
Lophorhothon
Lophostropheus
Loricatosaurus
Loricosaurus
Losillasaurus
Lourinhanosaurus
Lourinhasaurus
Luanchuanraptor
Luanpingosaurus
Lucianosaurus
Lucianovenator
Lufengosaurus
Lukousaurus
Luoyanggia
Lurdusaurus
Lusitanosaurus
Lusotitan
Lycorhinus
Lythronax
Macelognathus
Machairasaurus
Machairoceratops
Macrodontophion
Macrogryphosaurus
Macrophalangia
Macroscelosaurus
Macrurosaurus
Madsenius
Magnapaulia
Magnamanus
Magnirostris
Magnosaurus
Magulodon
Magyarosaurus
Mahakala
Maiasaura
Majungasaurus
Majungatholus
Malarguesaurus
Malawisaurus
Maleevosaurus
Maleevus
Mamenchisaurus
Manidens
Mandschurosaurus
Manospondylus
Mantellisaurus
Mantellodon
Mapusaurus
Marasuchus
Marisaurus
Marmarospondylus
Marshosaurus
Martharaptor
Masiakasaurus
Massospondylus
Matheronodon
Maxakalisaurus
Medusaceratops
Megacervixosaurus
Megadactylus
Megadontosaurus
Megalosaurus
Megapnosaurus
Megaraptor
Mei
Melanorosaurus
Mendozasaurus
Mercuriceratops
Meroktenos
Metriacanthosaurus
Microcephale
Microceratops
Microceratus
Microcoelus
Microdontosaurus
Microhadrosaurus
Micropachycephalosaurus
Microraptor
Microvenator
Mierasaurus
Mifunesaurus
Minmi
Minotaurasaurus
Miragaia
Mirischia
Moabosaurus
Mochlodon
Mohammadisaurus
Mojoceratops
Mongolosaurus
Monkonosaurus
Monoclonius
Monolophosaurus
Mononychus
Mononykus
Montanoceratops
Morelladon
Morinosaurus
Morosaurus
Morrosaurus
Mosaiceratops
Moshisaurus
Mtapaiasaurus
Mtotosaurus
Murusraptor
Mussaurus
Muttaburrasaurus
Muyelensaurus
Mymoorapelta
Naashoibitosaurus
Nambalia
Nankangia
Nanningosaurus
Nanosaurus
Nanotyrannus
Nanshiungosaurus
Nanuqsaurus
Nanyangosaurus
Narambuenatitan
Nasutoceratops
Natronasaurus
Nebulasaurus
Nectosaurus
Nedcolbertia
Nedoceratops
Neimongosaurus
Nemegtia
Nemegtomaia
Nemegtosaurus
Neosaurus
Neosodon
Neovenator
Neuquenraptor
Neuquensaurus
Newtonsaurus
Ngexisaurus
Nicksaurus
Nigersaurus
Ningyuansaurus
Niobrarasaurus
Nipponosaurus
Noasaurus
Nodocephalosaurus
Nodosaurus
Nomingia
Nopcsaspondylus
Normanniasaurus
Nothronychus
Notoceratops
Notocolossus
Notohypsilophodon
Nqwebasaurus
Nteregosaurus
Nurosaurus
Nuthetes
Nyasasaurus
Nyororosaurus
Ohmdenosaurus
Ojoceratops
Ojoraptorsaurus
Oligosaurus
Olorotitan
Omeisaurus
Omosaurus
Onychosaurus
Oohkotokia
Opisthocoelicaudia
Oplosaurus
Orcomimus
OrinosaurusOrkoraptor
OrnatotholusOrnithodesmus
Ornithoides
Ornitholestes
Ornithomerus
Ornithomimoides
Ornithomimus
Ornithopsis
Ornithosuchus
Ornithotarsus
Orodromeus
Orosaurus
Orthogoniosaurus
Orthomerus
Oryctodromeus
Oshanosaurus
Osmakasaurus
Ostafrikasaurus
Ostromia
Othnielia
Othnielosaurus
Otogosaurus
Ouranosaurus
Overosaurus
Oviraptor
Ovoraptor
Owenodon
Oxalaia
Ozraptor
Pachycephalosaurus
Pachyrhinosaurus
Pachysauriscus
Pachysaurops
Pachysaurus
Pachyspondylus
Pachysuchus
Padillasaurus
Pakisaurus
Palaeoctonus
Palaeocursornis
Palaeolimnornis
Palaeopteryx
Palaeosauriscus
Palaeosaurus
Palaeosaurus
Palaeoscincus
Paleosaurus
Paludititan
Paluxysaurus
Pampadromaeus
Pamparaptor
Panamericansaurus
Pandoravenator
Panguraptor
Panoplosaurus
Panphagia
Pantydraco
Paraiguanodon
Paralititan
Paranthodon
Pararhabdodon
Parasaurolophus
Pareiasaurus
Parksosaurus
Paronychodon
Parrosaurus
Parvicursor
Patagonykus
Patagosaurus
Patagotitan
Pawpawsaurus
Pectinodon
Pedopenna
Pegomastax
Peishansaurus
Pekinosaurus
Pelecanimimus
Pellegrinisaurus
Peloroplites
Pelorosaurus
Peltosaurus
Penelopognathus
Pentaceratops
Petrobrasaurus
Phaedrolosaurus
Philovenator
Phuwiangosaurus
Phyllodon
Piatnitzkysaurus
Picrodon
Pinacosaurus
Pisanosaurus
Pitekunsaurus
Piveteausaurus
Planicoxa
Plateosauravus
Plateosaurus
Platyceratops
Plesiohadros
Pleurocoelus
Pleuropeltus
Pneumatoarthrus
Pneumatoraptor
Podokesaurus
Poekilopleuron
Polacanthoides
Polacanthus
Polyodontosaurus
Polyonax
Ponerosteus
Poposaurus
Parasaurolophus
Postosuchus
Powellvenator
Pradhania
Prenocephale
Prenoceratops
Priconodon
Priodontognathus
Proa
Probactrosaurus
Probrachylophosaurus
Proceratops
Proceratosaurus
Procerosaurus
Procerosaurus
Procheneosaurus
Procompsognathus
Prodeinodon
Proiguanodon
Propanoplosaurus
Proplanicoxa
Prosaurolophus
Protarchaeopteryx
Protecovasaurus
Protiguanodon
Protoavis
Protoceratops
Protognathosaurus
Protognathus
Protohadros
Protorosaurus
Protorosaurus
Protrachodon
Proyandusaurus
Pseudolagosuchus
Psittacosaurus
Pteropelyx
Pterospondylus
Puertasaurus
Pukyongosaurus
Pulanesaura
Pycnonemosaurus
Pyroraptor
Qantassaurus
Qianzhousaurus
Qiaowanlong
Qijianglong
Qinlingosaurus
Qingxiusaurus
Qiupalong
Quaesitosaurus
Quetecsaurus
Quilmesaurus
Rachitrema
Rahiolisaurus
Rahona
Rahonavis
Rajasaurus
Rapator
Rapetosaurus
Raptorex
Ratchasimasaurus
Rativates
Rayososaurus
Razanandrongobe
Rebbachisaurus
Regaliceratops
Regnosaurus
Revueltosaurus
Rhabdodon
Rhadinosaurus
Rhinorex
Rhodanosaurus
Rhoetosaurus
Rhopalodon
Riabininohadros
Richardoestesia
Rileya
Rileyasuchus
Rinchenia
Rinconsaurus
Rioarribasaurus
Riodevasaurus
Riojasaurus
Riojasuchus
Rocasaurus
Roccosaurus
Rubeosaurus
Ruehleia
Rugocaudia
Rugops
Rukwatitan
Ruyangosaurus
Sacisaurus
Sahaliyania
Saichania
Saldamosaurus
Salimosaurus
Saltasaurus
Saltopus
Saltriosaurus
Sanchusaurus
Sangonghesaurus
Sanjuansaurus
Sanpasaurus
Santanaraptor
Saraikimasoom
Sarahsaurus
Sarcolestes
Sarcosaurus
Sarmientosaurus
Saturnalia
Sauraechinodon
Saurolophus
Sauroniops
Sauropelta
Saurophaganax
Saurophagus
Sauroplites
Sauroposeidon
Saurornithoides
Saurornitholestes
Savannasaurus
Scansoriopteryx
Scaphonyx
Scelidosaurus
Scipionyx
Sciurumimus
Scleromochlus
Scolosaurus
Scutellosaurus
Secernosaurus
Sefapanosaurus
Segisaurus
Segnosaurus
Seismosaurus
Seitaad
Selimanosaurus
Sellacoxa
Sellosaurus
Serendipaceratops
Serikornis
Shamosaurus
Shanag
Shanshanosaurus
Shantungosaurus
Shanxia
Shanyangosaurus
Shaochilong
Shenzhousaurus
Shidaisaurus
Shingopana
Shixinggia
Shuangbaisaurus
Shuangmiaosaurus
Shunosaurus
Shuvosaurus
Shuvuuia
Siamodon
Siamodracon
Siamosaurus
Siamotyrannus
Siats
Sibirosaurus
Sibirotitan
Sidormimus
Sigilmassasaurus
Silesaurus
Siluosaurus
Silvisaurus
Similicaudipteryx
Sinocalliopteryx
Sinoceratops
Sinocoelurus
Sinopelta
Sinopeltosaurus
Sinornithoides
Sinornithomimus
Sinornithosaurus
Sinosauropteryx
Sinosaurus
Sinotyrannus
Sinovenator
Sinraptor
Sinusonasus
Sirindhorna
Skorpiovenator
Smilodon
Sonidosaurus
Sonorasaurus
Soriatitan
Sphaerotholus
Sphenosaurus
Sphenospondylus
Spiclypeus
Spinophorosaurus
Spinops
Spinosaurus
Spinostropheus
Spinosuchus
Spondylosoma
Squalodon
Staurikosaurus
Stegoceras
Stegopelta
Stegosaurides
Stegosaurus
Stenonychosaurus
Stenopelix
Stenotholus
Stephanosaurus
Stereocephalus
Sterrholophus
Stokesosaurus
Stormbergia
Strenusaurus
Streptospondylus
Struthiomimus
Struthiosaurus
Stygimoloch
Stygivenator
Styracosaurus
Succinodon
Suchomimus
Suchosaurus
Suchoprion
Sugiyamasaurus
Skeleton
Sulaimanisaurus
Supersaurus
Suuwassea
Suzhousaurus
Symphyrophus
Syngonosaurus
Syntarsus
Syrmosaurus
Szechuanosaurus
Tachiraptor
Talarurus
Talenkauen
Talos
Tambatitanis
Tangvayosaurus
Tanius
Tanycolagreus
Tanystropheus
Tanystrosuchus
Taohelong
Tapinocephalus
Tapuiasaurus
Tarascosaurus
Tarbosaurus
Tarchia
Tastavinsaurus
Tatankacephalus
Tatankaceratops
Tataouinea
Tatisaurus
Taurovenator
Taveirosaurus
Tawa
Tawasaurus
Tazoudasaurus
Technosaurus
Tecovasaurus
Tehuelchesaurus
Teihivenator
Teinurosaurus
Teleocrater
Telmatosaurus
Tenantosaurus
Tenchisaurus
Tendaguria
Tengrisaurus
Tenontosaurus
Teratophoneus
Teratosaurus
Termatosaurus
Tethyshadros
Tetragonosaurus
Texacephale
Texasetes
Teyuwasu
Thecocoelurus
Thecodontosaurus
Thecospondylus
Theiophytalia
Therizinosaurus
Therosaurus
Thescelosaurus
Thespesius
Thotobolosaurus
Tianchisaurus
Tianchungosaurus
Tianyulong
Tianyuraptor
Tianzhenosaurus
Tichosteus
Tienshanosaurus
Timimus
Timurlengia
Titanoceratops
Titanosaurus
Titanosaurus
Tochisaurus
Tomodon
Tonganosaurus
Tongtianlong
Tonouchisaurus
Torilion
Tornieria
Torosaurus
Torvosaurus
Tototlmimus
Trachodon
Traukutitan
Trialestes
Triassolestes
Tribelesodon
Triceratops
Trigonosaurus
Trimucrodon
Trinisaura
Triunfosaurus
Troodon
Tsaagan
Tsagantegia
Tsintaosaurus
Tugulusaurus
Tuojiangosaurus
Turanoceratops
Turiasaurus
Tylocephale
Tylosteus
Tyrannosaurus
Tyrannotitan
Illustration
Uberabatitan
Udanoceratops
Ugrosaurus
Ugrunaaluk
Uintasaurus
Ultrasauros
Ultrasaurus
Ultrasaurus
Umarsaurus
Unaysaurus
Unenlagia
Unescoceratops
Unicerosaurus
Unquillosaurus
Urbacodon
Utahceratops
Utahraptor
Uteodon
Vagaceratops
Vahiny
Valdoraptor
Valdosaurus
Variraptor
Velociraptor
Vectensia
Vectisaurus
Velafrons
Velocipes
Velociraptor
Velocisaurus
Venaticosuchus
Venenosaurus
Veterupristisaurus
Viavenator
Vitakridrinda
Vitakrisaurus
Volkheimeria
Vouivria
Vulcanodon
Wadhurstia
Wakinosaurus
Walgettosuchus
Walkeria
Walkersaurus
Wangonisaurus
Wannanosaurus
Wellnhoferia
Wendiceratops
Wiehenvenator
Willinakaqe
Wintonotitan
Wuerhosaurus
Wulagasaurus
Wulatelong
Wyleyia
Wyomingraptor
Xenoceratops
Xenoposeidon
Xenotarsosaurus
Xianshanosaurus
Xiaosaurus
Xingxiulong
Xinjiangovenator
Xinjiangtitan
Xiongguanlong
Xixianykus
Xixiasaurus
Xixiposaurus
Xuanhanosaurus
Xuanhuaceratops
Xuanhuasaurus
Xuwulong
Yaleosaurus
Yamaceratops
Yandusaurus
Yangchuanosaurus
Yaverlandia
Yehuecauhceratops
Yezosaurus
Yibinosaurus
Yimenosaurus
Yingshanosaurus
Yinlong
Yixianosaurus
Yizhousaurus
Yongjinglong
Yuanmouraptor
Yuanmousaurus
Yueosaurus
Yulong
Yunganglong
Yunmenglong
Yunnanosaurus
Yunxianosaurus
Yurgovuchia
Yutyrannus
Zanabazar
Zanclodon
Zapalasaurus
Zapsalis
Zaraapelta
ZatomusZby
Zephyrosaurus
Zhanghenglong
Zhejiangosaurus
Zhenyuanlong
Zhongornis
Zhongjianosaurus
Zhongyuansaurus
Zhuchengceratops
Zhuchengosaurus
Zhuchengtitan
Zhuchengtyrannus
Ziapelta
Zigongosaurus
Zizhongosaurus
Zuniceratops
Zunityrannus
Zuolong
Zuoyunlong
Zupaysaurus
Zuul

================================================
FILE: gradient_checking.py
================================================
import numpy as np
from sklearn.datasets import  load_breast_cancer
from sklearn.model_selection import train_test_split

#initialize parameters(w,b)
def initialize_parameters(layer_dims):
	"""
	:param layer_dims: list,每一层单元的个数（维度）
	:return:dictionary,存储参数w1,w2,...,wL,b1,...,bL
	"""
	np.random.seed(1)
	L = len(layer_dims)#the number of layers in the network
	parameters = {}
	for l in range(1,L):
		# parameters["W" + str(l)] = np.random.randn(layer_dims[l],layer_dims[l-1])*0.01
		# parameters["W" + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1])*np.sqrt(2/layer_dims[l-1]) # he initialization
		# parameters["W" + str(l)] = np.zeros((layer_dims[l], layer_dims[l - 1])) #为了测试初始化为0的后果
		parameters["W" + str(l)] = np.random.randn(layer_dims[l], layer_dims[l - 1]) * np.sqrt(1 / layer_dims[l - 1])  # xavier initialization
		parameters["b" + str(l)] = np.zeros((layer_dims[l],1))
	return parameters

def relu(Z):
	"""
	:param Z: Output of the linear layer
	:return:
	A: output of activation
	"""
	A = np.maximum(0,Z)
	return A

#implement the activation function(ReLU and sigmoid)
def sigmoid(Z):
	"""
	:param Z: Output of the linear layer
	:return:
	"""
	A = 1 / (1 + np.exp(-Z))
	return A

def forward_propagation(X, parameters):
	"""
	X -- input dataset, of shape (input size, number of examples)
    parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2",...,"WL", "bL"
                    W -- weight matrix of shape (size of current layer, size of previous layer)
                    b -- bias vector of shape (size of current layer,1)
    :return:
	AL: the output of the last Layer(y_predict)
	caches: list, every element is a tuple:(W,b,z,A_pre)
	"""
	L = len(parameters) // 2  # number of layer
	A = X
	caches = [(None,None,None,X)]  # 第0层(None,None,None,A0) w,b,z用none填充,下标与层数一致，用于存储每一层的，w,b,z,A
	# calculate from 1 to L-1 layer
	for l in range(1,L):
		A_pre = A
		W = parameters["W" + str(l)]
		b = parameters["b" + str(l)]
		z = np.dot(W,A_pre) + b #计算z = wx + b
		A = relu(z) #relu activation function
		caches.append((W,b,z,A))
	# calculate Lth layer
	WL = parameters["W" + str(L)]
	bL = parameters["b" + str(L)]
	zL = np.dot(WL,A) + bL
	AL = sigmoid(zL)
	caches.append((WL,bL,zL,AL))
	return AL, caches

#calculate cost function
def compute_cost(AL,Y):
	"""
	:param AL: 最后一层的激活值，即预测值，shape:(1,number of examples)
	:param Y:真实值,shape:(1, number of examples)
	:return:
	"""
	m = Y.shape[1]
	cost = 1. / m * np.nansum(np.multiply(-np.log(AL), Y) + np.multiply(-np.log(1 - AL), 1 - Y))
	#从数组的形状中删除单维条目，即把shape中为1的维度去掉，比如把[[[2]]]变成2
	cost = np.squeeze(cost)
	return cost


# derivation of relu
def relu_backward(Z):
	"""
	:param Z: the input of activation
	:return:
	"""
	dA = np.int64(Z > 0)
	return dA

def backward_propagation(AL, Y, caches):
	"""
	Implement the backward propagation presented in figure 2.
	Arguments:
	X -- input dataset, of shape (input size, number of examples)
	Y -- true "label" vector (containing 0 if cat, 1 if non-cat)
	caches -- caches output from forward_propagation(),(W,b,z,A)

	Returns:
	gradients -- A dictionary with the gradients with respect to dW,db
	"""
	m = Y.shape[1]
	L = len(caches) - 1
	# print("L:   " + str(L))
	#calculate the Lth layer gradients
	prev_AL = caches[L-1][3]
	dzL = 1./m * (AL - Y)
	# print(dzL.shape)
	# print(prev_AL.T.shape)
	dWL = np.dot(dzL, prev_AL.T)
	dbL = np.sum(dzL, axis=1, keepdims=True)
	gradients = {"dW"+str(L):dWL, "db"+str(L):dbL}
	#calculate from L-1 to 1 layer gradients
	for l in reversed(range(1,L)): # L-1,L-3,....,1
		post_W= caches[l+1][0] #要用后一层的W
		dz = dzL #用后一层的dz

		dal = np.dot(post_W.T, dz)
		z = caches[l][2]#当前层的z
		dzl = np.multiply(dal, relu_backward(z))#可以直接用dzl = np.multiply(dal, np.int64(Al > 0))来实现
		prev_A = caches[l-1][3]#前一层的A
		dWl = np.dot(dzl, prev_A.T)
		dbl = np.sum(dzl, axis=1, keepdims=True)

		gradients["dW" + str(l)] = dWl
		gradients["db" + str(l)] = dbl
		dzL = dzl #更新dz
	return gradients

#convert parameter into vector
def dictionary_to_vector(parameters):
	"""
	Roll all our parameters dictionary into a single vector satisfying our specific required shape.
	"""
	count = 0
	for key in parameters:
		# flatten parameter
		new_vector = np.reshape(parameters[key], (-1, 1))#convert matrix into vector
		if count == 0:#刚开始时新建一个向量
			theta = new_vector
		else:
			theta = np.concatenate((theta, new_vector), axis=0)#和已有的向量合并成新向量
		count = count + 1

	return theta

#convert gradients into vector
def gradients_to_vector(gradients):
	"""
	Roll all our parameters dictionary into a single vector satisfying our specific required shape.
	"""
	# 因为gradient的存储顺序是{dWL,dbL,....dW2,db2,dW1,db1}，为了统一采用[dW1,db1,...dWL,dbL]方面后面求欧式距离（对应元素）
	L = len(gradients) // 2
	keys = []
	for l in range(L):
		keys.append("dW" + str(l + 1))
		keys.append("db" + str(l + 1))
	count = 0
	for key in keys:
		# flatten parameter
		new_vector = np.reshape(gradients[key], (-1, 1))#convert matrix into vector
		if count == 0:#刚开始时新建一个向量
			theta = new_vector
		else:
			theta = np.concatenate((theta, new_vector), axis=0)#和已有的向量合并成新向量
		count = count + 1

	return theta

#convert vector into dictionary
def vector_to_dictionary(theta, layer_dims):
	"""
    Unroll all our parameters dictionary from a single vector satisfying our specific required shape.
    """
	parameters = {}
	L = len(layer_dims)  # the number of layers in the network
	start = 0
	end = 0
	for l in range(1, L):
		end += layer_dims[l]*layer_dims[l-1]
		parameters["W" + str(l)] = theta[start:end].reshape((layer_dims[l],layer_dims[l-1]))
		start = end
		end += layer_dims[l]*1
		parameters["b" + str(l)] = theta[start:end].reshape((layer_dims[l],1))
		start = end
	return parameters


def gradient_check(parameters, gradients, X, Y, layer_dims, epsilon=1e-7):
	"""
	Checks if backward_propagation_n computes correctly the gradient of the cost output by forward_propagation_n

	Arguments:
	parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2", "W3", "b3":
	grad -- output of backward_propagation_n, contains gradients of the cost with respect to the parameters.
	x -- input datapoint, of shape (input size, 1)
	y -- true "label"
	epsilon -- tiny shift to the input to compute approximated gradient with formula(1)
	layer_dims -- the layer dimension of nn
	Returns:
	difference -- difference (2) between the approximated gradient and the backward propagation gradient
	"""

	parameters_vector = dictionary_to_vector(parameters)  # parameters_values
	grad = gradients_to_vector(gradients)
	num_parameters = parameters_vector.shape[0]
	J_plus = np.zeros((num_parameters, 1))
	J_minus = np.zeros((num_parameters, 1))
	gradapprox = np.zeros((num_parameters, 1))

	# Compute gradapprox
	for i in range(num_parameters):
		thetaplus = np.copy(parameters_vector)
		thetaplus[i] = thetaplus[i] + epsilon
		AL, _ = forward_propagation(X, vector_to_dictionary(thetaplus,layer_dims))
		J_plus[i] = compute_cost(AL,Y)

		thetaminus = np.copy(parameters_vector)
		thetaminus[i] = thetaminus[i] - epsilon
		AL, _ = forward_propagation(X, vector_to_dictionary(thetaminus, layer_dims))
		J_minus[i] = compute_cost(AL,Y)
		gradapprox[i] = (J_plus[i] - J_minus[i]) / (2 * epsilon)

	numerator = np.linalg.norm(grad - gradapprox)
	denominator = np.linalg.norm(grad) + np.linalg.norm(gradapprox)
	difference = numerator / denominator

	if difference > 2e-7:
		print(
			"\033[93m" + "There is a mistake in the backward propagation! difference = " + str(difference) + "\033[0m")
	else:
		print(
			"\033[92m" + "Your backward propagation works perfectly fine! difference = " + str(difference) + "\033[0m")

	return difference


if __name__ == "__main__":
	X_data, y_data = load_breast_cancer(return_X_y=True)
	X_train, X_test,y_train,y_test = train_test_split(X_data, y_data, train_size=0.8,test_size=0.2,random_state=28)
	X_train = X_train.T
	y_train = y_train.reshape(y_train.shape[0], -1).T
	X_test = X_test.T
	y_test = y_test.reshape(y_test.shape[0], -1).T

	#根据自己实现的bp计算梯度
	parameters = initialize_parameters([X_train.shape[0],5,3,1])
	AL, caches = forward_propagation(X_train,parameters)
	cost = compute_cost(AL,y_train)
	gradients = backward_propagation(AL,y_train,caches)
	#gradient checking
	# # print(X_train.shape[0])
	difference = gradient_check(parameters, gradients, X_train, y_train,[X_train.shape[0],5,3,1])


================================================
FILE: rnn.py
================================================
import numpy as np


def initialize_parameters(n_a, n_x, n_y):
	"""
	Initialize parameters with small random values
	Returns:
	parameters -- python dictionary containing:
						Wax -- Weight matrix multiplying the input, of shape (n_a, n_x)
						Waa -- Weight matrix multiplying the hidden state, of shape (n_a, n_a)
						Wya -- Weight matrix relating the hidden-state to the output, of shape (n_y, n_a)
						b --  Bias, numpy array of shape (n_a, 1)
						by -- Bias relating the hidden-state to the output, of shape (n_y, 1)
	"""
	np.random.seed(1)
	Wax = np.random.randn(n_a, n_x) * 0.01  # input to hidden
	Waa = np.random.randn(n_a, n_a) * 0.01  # hidden to hidden
	Wya = np.random.randn(n_y, n_a) * 0.01  # hidden to output
	ba = np.zeros((n_a, 1))  # hidden bias
	by = np.zeros((n_y, 1))  # output bias
	parameters = {"Wax": Wax, "Waa": Waa, "Wya": Wya, "ba": ba, "by": by}

	return parameters


def softmax(x):
	#这里减去最大值，是为了防止大数溢出，根据softmax的参数冗余性，减去任意一个数，结果不变
	e_x = np.exp(x - np.max(x))
	return e_x / np.sum(e_x, axis=0)


def rnn_step_forward(xt, a_prev, parameters):
	"""
	Implements a single forward step of the RNN-cell that uses a tanh
    activation function

	Arguments:
	xt -- the input data at timestep "t", of shape (n_x, m).
	a_prev -- Hidden state at timestep "t-1", of shape (n_a, m)
	**here, n_x denotes the dimension of word vector, n_a denotes the number of hidden units in a RNN cell
	parameters -- python dictionary containing:
						Wax -- Weight matrix multiplying the input, of shape (n_a, n_x)
						Waa -- Weight matrix multiplying the hidden state, of shape (n_a, n_a)
						Wya -- Weight matrix relating the hidden-state to the output, of shape (n_y, n_a)
						ba --  Bias,of shape (n_a, 1)
						by -- Bias relating the hidden-state to the output,of shape (n_y, 1)
	Returns:
	a_next -- next hidden state, of shape (n_a, 1)
	yt_pred -- prediction at timestep "t", of shape (n_y, 1)
	"""

	# get parameters from "parameters"
	Wax = parameters["Wax"] #(n_a, n_x)
	Waa = parameters["Waa"] #(n_a, n_a)
	Wya = parameters["Wya"] #(n_y, n_a)
	ba = parameters["ba"]   #(n_a, 1)
	by = parameters["by"]   #(n_y, 1)

	a_next = np.tanh(np.dot(Waa, a_prev) + np.dot(Wax, xt) + ba) #(n_a, 1)
	yt_pred = softmax(np.dot(Wya, a_next) + by) #(n_y,1)

	return a_next, yt_pred


def rnn_forward(X, Y, a0, parameters, vocab_size=27):
	x, a, y_hat = {}, {}, {}
	a[-1] = np.copy(a0)
	# initialize your loss to 0
	loss = 0

	for t in range(len(X)):
		# Set x[t] to be the one-hot vector representation of the t'th character in X.
		# if X[t] == None, we just have x[t]=0. This is used to set the input for the first timestep to the zero vector.
		x[t] = np.zeros((vocab_size, 1))
		if (X[t] != None):
			x[t][X[t]] = 1
		# Run one step forward of the RNN
		a[t], y_hat[t] = rnn_step_forward(x[t], a[t - 1], parameters) #a[t]: (n_a,1), y_hat[t]:(n_y,1)
		# Update the loss by substracting the cross-entropy term of this time-step from it.
		#这里因为真实的label也是采用onehot表示的，因此只要把label向量中1对应的概率拿出来就行了
		loss -= np.log(y_hat[t][Y[t], 0])

	cache = (y_hat, a, x)

	return loss, cache


def rnn_step_backward(dy, gradients, parameters, x, a, a_prev):

	gradients['dWya'] += np.dot(dy, a.T)
	gradients['dby'] += dy
	Wya = parameters['Wya']
	Waa = parameters['Waa']
	da = np.dot(Wya.T, dy) + gradients['da_next']  #每个cell的Upstream有两条，一条da_next过来的，一条y_hat过来的
	dtanh = (1 - a * a) * da  # backprop through tanh nonlinearity
	gradients['dba'] += dtanh
	gradients['dWax'] += np.dot(dtanh, x.T)
	gradients['dWaa'] += np.dot(dtanh, a_prev.T)
	gradients['da_next'] = np.dot(Waa.T, dtanh)

	return gradients


def rnn_backward(X, Y, parameters, cache):
	# Initialize gradients as an empty dictionary
	gradients = {}

	# Retrieve from cache and parameters
	(y_hat, a, x) = cache
	Waa, Wax, Wya, by, ba = parameters['Waa'], parameters['Wax'], parameters['Wya'], parameters['by'], parameters['ba']

	# each one should be initialized to zeros of the same dimension as its corresponding parameter
	gradients['dWax'], gradients['dWaa'], gradients['dWya'] = np.zeros_like(Wax), np.zeros_like(Waa), np.zeros_like(Wya)
	gradients['dba'], gradients['dby'] = np.zeros_like(ba), np.zeros_like(by)
	gradients['da_next'] = np.zeros_like(a[0])

	# Backpropagate through time
	for t in reversed(range(len(X))):
		dy = np.copy(y_hat[t])
		dy[Y[t]] -= 1 #计算y_hat - y,即预测值-真实值，因为真实值是one-hot向量，只有1个1，其它都是0，
		# 所以只要在预测值（向量）对应位置减去1即可，其它位置减去0相当于没变
		gradients = rnn_step_backward(dy, gradients, parameters, x[t], a[t], a[t - 1])

	return gradients, a

#梯度裁剪
def clip(gradients, maxValue):
	'''
	Clips the gradients' values between minimum and maximum.

	Arguments:
	gradients -- a dictionary containing the gradients "dWaa", "dWax", "dWya", "db", "dby"
	maxValue -- everything above this number is set to this number, and everything less than -maxValue is set to -maxValue

	Returns:
	gradients -- a dictionary with the clipped gradients.
	'''

	dWaa, dWax, dWya, db, dby = gradients['dWaa'], gradients['dWax'], gradients['dWya'], gradients['dba'], gradients['dby']

	# clip to mitigate exploding gradients, loop over [dWax, dWaa, dWya, db, dby].
	for gradient in [dWax, dWaa, dWya, db, dby]:
		np.clip(gradient, -maxValue, maxValue, gradient)

	gradients = {"dWaa": dWaa, "dWax": dWax, "dWya": dWya, "dba": db, "dby": dby}

	return gradients



def update_parameters(parameters, gradients, lr):
	parameters['Wax'] += -lr * gradients['dWax']
	parameters['Waa'] += -lr * gradients['dWaa']
	parameters['Wya'] += -lr * gradients['dWya']
	parameters['ba']  += -lr * gradients['dba']
	parameters['by']  += -lr * gradients['dby']
	return parameters


def sample(parameters, char_to_ix, seed):
	"""
	Sample a sequence of characters according to a sequence of probability distributions output of the RNN
	Arguments:
	parameters -- python dictionary containing the parameters Waa, Wax, Wya, by, and b.
	char_to_ix -- python dictionary mapping each character to an index.
	seed -- used for grading purposes. Do not worry about it.
	Returns:
	indices -- a list of length n containing the indices of the sampled characters.
	"""

	# Retrieve parameters and relevant shapes from "parameters" dictionary
	Waa, Wax, Wya, by, ba = parameters['Waa'], parameters['Wax'], parameters['Wya'], parameters['by'], parameters['ba']
	vocab_size = by.shape[0]
	n_a = Waa.shape[1]
	# Step 1: Create the one-hot vector x for the first character (initializing the sequence generation).
	x = np.zeros((vocab_size, 1))
	# Step 1': Initialize a_prev as zeros
	a_prev = np.zeros((n_a, 1))

	# Create an empty list of indices, this is the list which will contain the list of indices of the characters to generate
	indices = []

	# Idx is a flag to detect a newline character, we initialize it to -1
	idx = -1

	# Loop over time-steps t. At each time-step, sample a character from a probability distribution and append
	# its index to "indices". We'll stop if we reach 50 characters (which should be very unlikely with a well
	# trained model), which helps debugging and prevents entering an infinite loop.
	counter = 0
	newline_character = char_to_ix['\n']

	while (idx != newline_character and counter != 50):
		# Step 2: Forward propagate x using the equations (1), (2) and (3)
		a = np.tanh(np.dot(Wax, x) + np.dot(Waa, a_prev) + ba)
		z = np.dot(Wya, a) + by
		y = softmax(z)

		# for grading purposes
		np.random.seed(counter + seed)

		# Step 3: Sample the index of a character within the vocabulary from the probability distribution y
		idx = np.random.choice(vocab_size, p = y.ravel())  # 等价于np.random.choice([0,1,...,vocab_size-1], p = y.ravel())，
		# 一维数组或者int型变量，如果是数组，就按照里面的范围来进行采样，如果是单个变量，则采用np.arange(a)的形式
		# Append the index to "indices"
		indices.append(idx)
		# Step 4: Overwrite the input character as the one corresponding to the sampled index.
		#每次生成的字符是下一个时间步的输入
		x = np.zeros((vocab_size, 1))
		x[idx] = 1

		# Update "a_prev" to be "a"
		a_prev = a
		# for grading purposes
		seed += 1
		counter += 1
	if (counter == 50):
		indices.append(char_to_ix['\n'])

	return indices


def optimize(X, Y, a_prev, parameters, learning_rate=0.01):
	"""
	Execute one step of the optimization to train the model.

	Arguments:
	X -- list of integers, where each integer is a number that maps to a character in the vocabulary.
	Y -- list of integers, exactly the same as X but shifted one index to the left.
	a_prev -- previous hidden state.
	parameters -- python dictionary containing:
						Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
						Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
						Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
						b --  Bias, numpy array of shape (n_a, 1)
						by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)
	learning_rate -- learning rate for the model.

	Returns:
	loss -- value of the loss function (cross-entropy)
	gradients -- python dictionary containing:
						dWax -- Gradients of input-to-hidden weights, of shape (n_a, n_x)
						dWaa -- Gradients of hidden-to-hidden weights, of shape (n_a, n_a)
						dWya -- Gradients of hidden-to-output weights, of shape (n_y, n_a)
						db -- Gradients of bias vector, of shape (n_a, 1)
						dby -- Gradients of output bias vector, of shape (n_y, 1)
	a[len(X)-1] -- the last hidden state, of shape (n_a, 1)
	"""

	# Forward propagate through time
	loss, cache = rnn_forward(X, Y, a_prev, parameters)

	# Backpropagate through time
	gradients, a = rnn_backward(X, Y, parameters, cache)

	# Clip your gradients between -5 (min) and 5 (max)
	gradients = clip(gradients, 5)

	# Update parameters
	parameters = update_parameters(parameters, gradients, learning_rate)

	return loss, parameters, a[len(X) - 1]

def get_initial_loss(vocab_size, seq_length):
	return -np.log(1.0/vocab_size)*seq_length

def smooth(loss, cur_loss):
	return loss * 0.999 + cur_loss * 0.001

def print_sample(sample_ix, ix_to_char):
	txt = ''.join(ix_to_char[ix] for ix in sample_ix)
	txt = txt[0].upper() + txt[1:]  # capitalize first character
	print ('%s' % (txt, ), end='')


def model(data, ix_to_char, char_to_ix, num_iterations=35000, n_a=50, dino_names=7, vocab_size=27):
	"""
	Trains the model and generates dinosaur names.

	Arguments:
	data -- text corpus
	ix_to_char -- dictionary that maps the index to a character
	char_to_ix -- dictionary that maps a character to an index
	num_iterations -- number of iterations to train the model for
	n_a -- number of units of the RNN cell
	dino_names -- number of dinosaur names you want to sample at each iteration.
	vocab_size -- number of unique characters found in the text, size of the vocabulary

	Returns:
	parameters -- learned parameters
	"""

	# Retrieve n_x and n_y from vocab_size
	n_x, n_y = vocab_size, vocab_size

	# Initialize parameters
	parameters = initialize_parameters(n_a, n_x, n_y)
	# Initialize loss (this is required because we want to smooth our loss, don't worry about it)
	loss = get_initial_loss(vocab_size, dino_names)

	# Initialize the hidden state of your LSTM
	a_prev = np.zeros((n_a, 1))

	# Optimization loop
	for j in range(num_iterations):

		# Use the hint above to define one training example (X,Y) (≈ 2 lines)
		index = j % len(data)
		X = [None] + [char_to_ix[ch] for ch in data[index]]
		Y = X[1:] + [char_to_ix["\n"]]
		# Perform one optimization step: Forward-prop -> Backward-prop -> Clip -> Update parameters
		# Choose a learning rate of 0.01
		curr_loss, parameters, a_prev = optimize(X, Y, a_prev, parameters, learning_rate = 0.01)
		# Use a latency trick to keep the loss smooth. It happens here to accelerate the training.
		loss = smooth(loss, curr_loss)
		# Every 2000 Iteration, generate "n" characters thanks to sample() to check if the model is learning properly
		if j % 2000 == 0:
			print('Iteration: %d, Loss: %f' % (j, loss) + '\n')
			# The number of dinosaur names to print
			seed = 0
			for name in range(dino_names):
				# Sample indices and print them
				sampled_indices = sample(parameters, char_to_ix, seed)
				print_sample(sampled_indices, ix_to_char)
				seed += 1  # To get the same result for grading purposed, increment the seed by one.

			print('\n')

	return parameters


if __name__ == "__main__":
	data = open('dinos.txt', 'r').read()
	data = data.lower()
	chars = list(set(data))  # str->set,例如:'123'转set，会转为无序不重复的，形如:{'1','2','3'}
	print(chars)
	data_size, vocab_size = len(data), len(chars)
	print('There are %d total characters and %d unique characters in your data.' % (data_size, vocab_size))
	char_to_ix = {ch: i for i, ch in enumerate(sorted(chars))}
	ix_to_char = {i: ch for i, ch in enumerate(sorted(chars))}
	print(ix_to_char)

	# Build list of all dinosaur names (training examples).
	with open("dinos.txt") as f:
		examples = f.readlines()
	examples = [x.lower().strip() for x in examples]

	# Shuffle list of all dinosaur names
	np.random.seed(0)
	np.random.shuffle(examples)

	parameters = model(examples, ix_to_char, char_to_ix)

Download .txt

gitextract_8bva7oa8/

├── README.md
├── batch_normalization.py
├── compare_initializations.py
├── deep_neural_network_ng.py
├── deep_neural_network_release.py
├── deep_neural_network_v1.py
├── deep_neural_network_v2.py
├── deep_neural_network_with_L2.py
├── deep_neural_network_with_dropout.py
├── deep_neural_network_with_gd.py
├── deep_neural_network_with_optimizers.py
├── dinos.txt
├── gradient_checking.py
└── rnn.py

Download .txt

SYMBOL INDEX (158 symbols across 12 files)

FILE: batch_normalization.py
  function initialize_parameters (line 9) | def initialize_parameters(layer_dims):
  function relu_forward (line 32) | def relu_forward(Z):
  function sigmoid_forward (line 42) | def sigmoid_forward(Z):
  function linear_forward (line 50) | def linear_forward(X, W, b):
  function batchnorm_forward (line 54) | def batchnorm_forward(z, gamma, beta, epsilon = 1e-12):
  function forward_propagation (line 68) | def forward_propagation(X, parameters, bn_param, decay = 0.9):
  function compute_cost (line 109) | def compute_cost(AL,Y):
  function relu_backward (line 127) | def relu_backward(dA, Z):
  function batchnorm_backward (line 135) | def batchnorm_backward(dout, cache):
  function linear_backward (line 148) | def linear_backward(dZ, cache):
  function backward_propagation (line 160) | def backward_propagation(AL, Y, caches):
  function update_parameters (line 198) | def update_parameters(parameters, grads, learning_rate):
  function random_mini_batches (line 214) | def random_mini_batches(X, Y, mini_batch_size = 64, seed=1):
  function L_layer_model (line 252) | def L_layer_model(X, Y, layer_dims, learning_rate, num_iterations, mini_...
  function forward_propagation_for_test (line 294) | def forward_propagation_for_test(X, parameters, bn_param, epsilon = 1e-12):
  function predict (line 339) | def predict(X_test, y_test, parameters, bn_param):
  function DNN (line 359) | def DNN(X_train, y_train, X_test, y_test, layer_dims, learning_rate= 0.0...

FILE: compare_initializations.py
  function initialize_parameters_zeros (line 7) | def initialize_parameters_zeros(layers_dims):
  function initialize_parameters_random (line 28) | def initialize_parameters_random(layers_dims):
  function initialize_parameters_xavier (line 50) | def initialize_parameters_xavier(layers_dims):
  function initialize_parameters_he (line 72) | def initialize_parameters_he(layers_dims):
  function relu (line 94) | def relu(Z):
  function initialize_parameters (line 104) | def initialize_parameters(layer_dims):
  function forward_propagation (line 117) | def forward_propagation(initialization="he"):

FILE: deep_neural_network_ng.py
  function initialize_parameters (line 6) | def initialize_parameters(layer_dims):
  function linear_forward (line 19) | def linear_forward(A_pre,W,b):
  function relu (line 32) | def relu(Z):
  function sigmoid (line 43) | def sigmoid(Z):
  function linear_activation_forward (line 52) | def linear_activation_forward(A_pre,W,b,activation):
  function L_model_forward (line 71) | def L_model_forward(X,parameters):
  function compute_cost (line 120) | def compute_cost(AL,Y):
  function sigmoid_backward (line 132) | def sigmoid_backward(dA, Z):
  function relu_backward (line 143) | def relu_backward(dA, cache):
  function linear_backward (line 159) | def linear_backward(dZ, cache):
  function linear_activation_backward (line 172) | def linear_activation_backward(dA, cache, activation):
  function L_model_backward (line 189) | def L_model_backward(AL, Y, caches):
  function update_parameters (line 219) | def update_parameters(parameters, grads, learning_rate):
  function L_layer_model (line 233) | def L_layer_model(X, Y, layer_dims, learning_rate, num_iterations):
  function predict (line 266) | def predict(X,y,parameters):
  function DNN (line 286) | def DNN(X_train, y_train, X_test, y_test, layer_dims, learning_rate= 0.0...

FILE: deep_neural_network_release.py
  function initialize_parameters (line 13) | def initialize_parameters(layer_dims):
  function linear_forward (line 29) | def linear_forward(x, w, b):
  function relu_forward (line 39) | def relu_forward(Z):
  function sigmoid (line 49) | def sigmoid(Z):
  function forward_propagation (line 57) | def forward_propagation(X, parameters):
  function compute_cost (line 87) | def compute_cost(AL,Y):
  function relu_backward (line 106) | def relu_backward(dA, Z):
  function linear_backward (line 116) | def linear_backward(dZ, cache):
  function backward_propagation (line 129) | def backward_propagation(AL, Y, caches):
  function update_parameters (line 161) | def update_parameters(parameters, grads, learning_rate):
  function L_layer_model (line 174) | def L_layer_model(X, Y, layer_dims, learning_rate, num_iterations):
  function predict (line 209) | def predict(X_test,y_test,parameters):
  function DNN (line 229) | def DNN(X_train, y_train, X_test, y_test, layer_dims, learning_rate= 0.0...

FILE: deep_neural_network_v1.py
  function initialize_parameters (line 6) | def initialize_parameters(layer_dims):
  function relu (line 18) | def relu(Z):
  function sigmoid (line 27) | def sigmoid(Z):
  function forward_propagation (line 35) | def forward_propagation(X, parameters):
  function compute_cost (line 64) | def compute_cost(AL,Y):
  function relu_backward (line 80) | def relu_backward(Z):
  function backward_propagation (line 88) | def backward_propagation(AL, Y, caches):
  function update_parameters (line 127) | def update_parameters(parameters, grads, learning_rate):
  function L_layer_model (line 140) | def L_layer_model(X, Y, layer_dims, learning_rate, num_iterations):
  function predict (line 173) | def predict(X_test,y_test,parameters):
  function DNN (line 192) | def DNN(X_train, y_train, X_test, y_test, layer_dims, learning_rate= 0.0...

FILE: deep_neural_network_v2.py
  function initialize_parameters (line 6) | def initialize_parameters(layer_dims):
  function relu (line 21) | def relu(Z):
  function sigmoid (line 30) | def sigmoid(Z):
  function forward_propagation (line 38) | def forward_propagation(X, parameters):
  function compute_cost (line 67) | def compute_cost(AL,Y):
  function relu_backward (line 85) | def relu_backward(Z):
  function backward_propagation (line 93) | def backward_propagation(AL, Y, caches):
  function update_parameters (line 132) | def update_parameters(parameters, grads, learning_rate):
  function L_layer_model (line 145) | def L_layer_model(X, Y, layer_dims, learning_rate, num_iterations):
  function predict (line 180) | def predict(X_test,y_test,parameters):
  function DNN (line 199) | def DNN(X_train, y_train, X_test, y_test, layer_dims, learning_rate= 0.0...

FILE: deep_neural_network_with_L2.py
  function initialize_parameters (line 7) | def initialize_parameters(layer_dims):
  function relu (line 23) | def relu(Z):
  function sigmoid (line 32) | def sigmoid(Z):
  function forward_propagation (line 40) | def forward_propagation(X, parameters):
  function compute_cost (line 70) | def compute_cost(AL,Y):
  function compute_cost_with_regularization (line 83) | def compute_cost_with_regularization(AL, Y, parameters, lambd):
  function relu_backward (line 105) | def relu_backward(Z):
  function backward_propagation_with_regularization (line 113) | def backward_propagation_with_regularization(AL, Y, caches, lambd):
  function update_parameters (line 152) | def update_parameters(parameters, grads, learning_rate):
  function L_layer_model (line 165) | def L_layer_model(X, Y, layer_dims, learning_rate, num_iterations,lambd):
  function predict (line 200) | def predict(X_test,y_test,parameters):
  function DNN (line 219) | def DNN(X_train, y_train, X_test, y_test, layer_dims, learning_rate= 0.0...

FILE: deep_neural_network_with_dropout.py
  function initialize_parameters (line 8) | def initialize_parameters(layer_dims):
  function relu (line 24) | def relu(Z):
  function sigmoid (line 33) | def sigmoid(Z):
  function forward_propagation (line 41) | def forward_propagation(X, parameters):
  function forward_propagation_with_dropout (line 72) | def forward_propagation_with_dropout(X, parameters, keep_prob = 0.8):
  function compute_cost (line 108) | def compute_cost(AL,Y):
  function relu_backward (line 121) | def relu_backward(Z):
  function backward_propagation_with_dropout (line 130) | def backward_propagation_with_dropout(AL, Y, caches, keep_prob = 0.8):
  function update_parameters (line 172) | def update_parameters(parameters, grads, learning_rate):
  function L_layer_model (line 185) | def L_layer_model(X, Y, layer_dims, learning_rate, num_iterations,keep_p...
  function predict (line 220) | def predict(X_test,y_test,parameters):
  function DNN (line 240) | def DNN(X_train, y_train, X_test, y_test, layer_dims, learning_rate= 0.0...

FILE: deep_neural_network_with_gd.py
  function initialize_parameters (line 6) | def initialize_parameters(layer_dims):
  function relu (line 21) | def relu(Z):
  function sigmoid (line 30) | def sigmoid(Z):
  function forward_propagation (line 38) | def forward_propagation(X, parameters):
  function compute_cost (line 67) | def compute_cost(AL,Y):
  function relu_backward (line 85) | def relu_backward(Z):
  function backward_propagation (line 93) | def backward_propagation(AL, Y, caches):
  function update_parameters (line 132) | def update_parameters(parameters, grads, learning_rate):
  function random_mini_batches (line 146) | def random_mini_batches(X, Y, mini_batch_size = 64, seed=1):
  function L_layer_model (line 183) | def L_layer_model(X, Y, layer_dims, learning_rate, num_iterations, gradi...
  function predict (line 258) | def predict(X_test,y_test,parameters):
  function DNN (line 277) | def DNN(X_train, y_train, X_test, y_test, layer_dims, learning_rate= 0.0...

FILE: deep_neural_network_with_optimizers.py
  function initialize_parameters (line 6) | def initialize_parameters(layer_dims):
  function relu (line 21) | def relu(Z):
  function sigmoid (line 30) | def sigmoid(Z):
  function forward_propagation (line 38) | def forward_propagation(X, parameters):
  function compute_cost (line 67) | def compute_cost(AL,Y):
  function relu_backward (line 85) | def relu_backward(Z):
  function backward_propagation (line 93) | def backward_propagation(AL, Y, caches):
  function update_parameters_with_gd (line 132) | def update_parameters_with_gd(parameters, grads, learning_rate):
  function random_mini_batches (line 146) | def random_mini_batches(X, Y, mini_batch_size = 64, seed=1):
  function initialize_velocity (line 184) | def initialize_velocity(parameters):
  function update_parameters_with_momentum (line 208) | def update_parameters_with_momentum(parameters, grads, v, beta, learning...
  function update_parameters_with_nesterov_momentum (line 250) | def update_parameters_with_nesterov_momentum(parameters, grads, v, beta,...
  function initialize_adagrad (line 293) | def initialize_adagrad(parameters):
  function update_parameters_with_adagrad (line 317) | def update_parameters_with_adagrad(parameters, grads, G, learning_rate, ...
  function initialize_adadelta (line 359) | def initialize_adadelta(parameters):
  function update_parameters_with_adadelta (line 399) | def update_parameters_with_adadelta(parameters, grads, rho, s, v, delta,...
  function update_parameters_with_rmsprop (line 453) | def update_parameters_with_rmsprop(parameters, grads, s, beta = 0.9, lea...
  function initialize_adam (line 491) | def initialize_adam(parameters):
  function update_parameters_with_adam (line 522) | def update_parameters_with_adam(parameters, grads, v, s, t, learning_rat...
  function L_layer_model (line 569) | def L_layer_model(X, Y, layer_dims, learning_rate, num_iterations, optim...
  function predict (line 637) | def predict(X_test,y_test,parameters):
  function DNN (line 656) | def DNN(X_train, y_train, X_test, y_test, layer_dims, learning_rate= 0.0...

FILE: gradient_checking.py
  function initialize_parameters (line 6) | def initialize_parameters(layer_dims):
  function relu (line 22) | def relu(Z):
  function sigmoid (line 32) | def sigmoid(Z):
  function forward_propagation (line 40) | def forward_propagation(X, parameters):
  function compute_cost (line 70) | def compute_cost(AL,Y):
  function relu_backward (line 84) | def relu_backward(Z):
  function backward_propagation (line 92) | def backward_propagation(AL, Y, caches):
  function dictionary_to_vector (line 132) | def dictionary_to_vector(parameters):
  function gradients_to_vector (line 149) | def gradients_to_vector(gradients):
  function vector_to_dictionary (line 172) | def vector_to_dictionary(theta, layer_dims):
  function gradient_check (line 190) | def gradient_check(parameters, gradients, X, Y, layer_dims, epsilon=1e-7):

FILE: rnn.py
  function initialize_parameters (line 4) | def initialize_parameters(n_a, n_x, n_y):
  function softmax (line 26) | def softmax(x):
  function rnn_step_forward (line 32) | def rnn_step_forward(xt, a_prev, parameters):
  function rnn_forward (line 65) | def rnn_forward(X, Y, a0, parameters, vocab_size=27):
  function rnn_step_backward (line 88) | def rnn_step_backward(dy, gradients, parameters, x, a, a_prev):
  function rnn_backward (line 104) | def rnn_backward(X, Y, parameters, cache):
  function clip (line 127) | def clip(gradients, maxValue):
  function update_parameters (line 151) | def update_parameters(parameters, gradients, lr):
  function sample (line 160) | def sample(parameters, char_to_ix, seed):
  function optimize (line 222) | def optimize(X, Y, a_prev, parameters, learning_rate=0.01):
  function get_initial_loss (line 263) | def get_initial_loss(vocab_size, seq_length):
  function smooth (line 266) | def smooth(loss, cur_loss):
  function print_sample (line 269) | def print_sample(sample_ix, ix_to_char):
  function model (line 275) | def model(data, ix_to_char, char_to_ix, num_iterations=35000, n_a=50, di...

Download .json

Condensed preview — 14 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (164K chars).

[
  {
    "path": "README.md",
    "chars": 3741,
    "preview": "# deep-learning\npersonal practice\n---------------\n深度学习个人练习，该项目实现了深度学习中一些常用的算法，内容包括：\n\n+ 四种初始化方法：zero initialize, random i"
  },
  {
    "path": "batch_normalization.py",
    "chars": 13619,
    "preview": "# implement the batch normalization\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom sklearn.datasets import  loa"
  },
  {
    "path": "compare_initializations.py",
    "chars": 4799,
    "preview": "\n#对比几种初始化方法\nimport numpy as np\nimport matplotlib.pyplot as plt\n\n#初始化为0\ndef initialize_parameters_zeros(layers_dims):\n\t\"\""
  },
  {
    "path": "deep_neural_network_ng.py",
    "chars": 9136,
    "preview": "import numpy as np\nfrom machine_learning.deep_neural_network.init_utils import load_dataset\nimport matplotlib.pyplot as "
  },
  {
    "path": "deep_neural_network_release.py",
    "chars": 7172,
    "preview": "\"\"\"\n把各部分分离出来，降低耦合度，使得结构更加清晰\n\"\"\"\n\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom sklearn.datasets import  load_b"
  },
  {
    "path": "deep_neural_network_v1.py",
    "chars": 6238,
    "preview": "import numpy as np\nimport matplotlib.pyplot as plt\nfrom sklearn.datasets import  load_breast_cancer\nfrom sklearn.model_s"
  },
  {
    "path": "deep_neural_network_v2.py",
    "chars": 6792,
    "preview": "import numpy as np\nimport matplotlib.pyplot as plt\nfrom sklearn.datasets import  load_breast_cancer\nfrom sklearn.model_s"
  },
  {
    "path": "deep_neural_network_with_L2.py",
    "chars": 7728,
    "preview": "import numpy as np\nimport matplotlib.pyplot as plt\nfrom sklearn.datasets import  load_breast_cancer\nfrom sklearn.model_s"
  },
  {
    "path": "deep_neural_network_with_dropout.py",
    "chars": 8634,
    "preview": "import numpy as np\nimport matplotlib.pyplot as plt\nfrom sklearn.datasets import  load_breast_cancer\nfrom sklearn.model_s"
  },
  {
    "path": "deep_neural_network_with_gd.py",
    "chars": 10253,
    "preview": "import numpy as np\nimport matplotlib.pyplot as plt\nfrom sklearn.datasets import  load_breast_cancer\nfrom sklearn.model_s"
  },
  {
    "path": "deep_neural_network_with_optimizers.py",
    "chars": 26558,
    "preview": "import numpy as np\nimport matplotlib.pyplot as plt\nfrom sklearn.datasets import  load_breast_cancer\nfrom sklearn.model_s"
  },
  {
    "path": "dinos.txt",
    "chars": 19909,
    "preview": "Aachenosaurus\nAardonyx\nAbdallahsaurus\nAbelisaurus\nAbrictosaurus\nAbrosaurus\nAbydosaurus\nAcanthopholis\nAchelousaurus\nAcher"
  },
  {
    "path": "gradient_checking.py",
    "chars": 8339,
    "preview": "import numpy as np\nfrom sklearn.datasets import  load_breast_cancer\nfrom sklearn.model_selection import train_test_split"
  },
  {
    "path": "rnn.py",
    "chars": 13033,
    "preview": "import numpy as np\n\n\ndef initialize_parameters(n_a, n_x, n_y):\n\t\"\"\"\n\tInitialize parameters with small random values\n\tRet"
  }
]

About this extraction

This page contains the full source code of the tz28/deep-learning GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 14 files (142.5 KB), approximately 48.2k tokens, and a symbol index with 158 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo