[
  {
    "path": "README.md",
    "content": "# deep-learning\npersonal practice\n---------------\n深度学习个人练习，该项目实现了深度学习中一些常用的算法，内容包括：\n\n+ 四种初始化方法：zero initialize, random initialize, xavier initialize, he initialize。\n\n+ 深度神经网络\n\n+ 正则化\n\n+ dropout\n\n+ 三种梯度下降方法：BGD, SGD, mini-batch\n\n+ 六种优化算法：momentum、nesterov momentum、Adagrad、Adadelta、RMSprop、Adam\n\n+ 梯度检验\n\n+ batch normalization\n\n+ recurrent neural network (RNN)\n------\n\n![#f03c15](https://placehold.it/15/f03c15/000000?text=+) ***Note: 下列 1-10中网络架构主要为四大块： initialize parameters、forward propagation、backward propagation、 update parameters，其中在 fp 和 bp 的时候各个功能没有单独封装，这样会导致耦合度过高，结构不清晰。\n11中优化了网络结构，使得耦合度更低，网络结构推荐用11中的结构。<br>\n今天（2018-9-23）重构了神经网络架构（见 deep_neural_network_release.py），把各功能函数分离出来，耦合度更低，结构更清楚，bp过程更加清晰。推荐此版本，用1-10时，可用此版本替换相应代码***\n\n1、**deep_neural_network_v1.py**：自己实现的最简单的深度神经网络（多层感知机),不包含正则化,dropout,动量等...总之是最基本的,只有fp和bp。\n\n2、**deep_neural_network_v2.py**:  自己实现的最简单的深度神经网络（多层感知机）,和v1的唯一区别在于：v1中fp过程,caches每一层存储的是（w,b,z,A_pre）,\n而v2每一层存储的是（w,b,z,A）, 第0层存储的（None,None,None,X）,X即A0。    `个人更推荐用v2版本`.\n\n关于具体的推导实现讲解，请移步本人的CSDN博客：https://blog.csdn.net/u012328159/article/details/79485767\n\n3、**deep_neural_network_ng.py**: ---改正版ng在Coursera上的深度神经网络<br>\n**具体主要改正的是对relu激活函数的求导，具体内容为:<br>**\n```python\ndef relu_backward(dA, cache):\n\t\"\"\"\n\tImplement the backward propagation for a single RELU unit.\n\t\n\tArguments:\n\tdA -- post-activation gradient, of any shape\n\tcache -- 'Z' where we store for computing backward propagation efficiently\n\tReturns:\n\tdZ -- Gradient of the cost with respect to Z\n\t\"\"\"\n\tZ = cache\n\tdZ = dA * np.int64(Z > 0)\n\treturn dZ\n```\n**ng在作业中写的relu导数（个人认为是错的）为：<br>**\n```python\ndef relu_backward(dA, cache):\n    \"\"\"\n    Implement the backward propagation for a single RELU unit.\n    Arguments:\n    \n    dA -- post-activation gradient, of any shape\n    cache -- 'Z' where we store for computing backward propagation efficiently\n    Returns:\n    dZ -- Gradient of the cost with respect to Z\n    \"\"\"\n    Z = cache\n    dZ = np.array(dA, copy=True) # just converting dz to a correct object.\n    \n    # When z <= 0, you should set dz to 0 as well. \n    dZ[Z <= 0] = 0\n    \n    assert (dZ.shape == Z.shape)\n    \n    return dZ\n```\n\n4、**compare_initializations.py**： 比较了四种初始化方法（初始化为0，随机初始化，Xavier initialization和He initialization），具体效果见CSDN博客：https://blog.csdn.net/u012328159/article/details/80025785\n\n5、 **deep_neural_network_with_L2.py**: 带L2正则项正则项的网络（在deep_neural_network.py的基础上增加了L2正则项）\n\n6、 **deep_neural_network_with_dropout.py** ：带dropout正则项的网络（在deep_neural_network.py的基础上增加了dropout正则项），具体详见CSDN博客：https://blog.csdn.net/u012328159/article/details/80210363\n\n7、 **gradient_checking.py** ： use gradient checking in dnn，梯度检验，可以检查自己手撸的bp是否正确。具体原理，详见我的CSDN博客：https://blog.csdn.net/u012328159/article/details/80232585\n\n8、 **deep_neural_network_with_gd.py** ：实现了三种梯度下降，包括：batch gradient descent（BGD）、stochastic gradient descent（SGD）和 mini-batch gradient descent。具体内容见我的CSDN博客：https://blog.csdn.net/u012328159/article/details/80252012\n\n9、 **deep_neural_network_with_optimizers.py** ：实现了深度学习中几种优化器，包括：momentum、nesterov momentum、Adagrad、Adadelta、RMSprop、Adam。关于这几种算法，具体内容，见本人的CSDN博客：https://blog.csdn.net/u012328159/article/details/80311892\n\n10、 **机器学习资料整理.pdf** ：整理了一些我知道的机器学习资料，希望能够帮助到想学习的同学。博客同步地址：https://blog.csdn.net/u012328159/article/details/80574713\n\n11、 **batch_normalization.py** ：实现了batch normalization, 改进了整个网络的架构，使得网络的架构更加清晰，耦合度更低。关于batch normalization的具体内容，见本人的CSDN博客：https://blog.csdn.net/u012328159/article/details/82840084\n\n12、 **deep_neural_network_release.py**：重构了深度神经网络，把各功能函数分离出来，耦合度更低，结构更清楚，bp过程更加清晰。**推荐此版本**\n\n13、**rnn.py**：recurrent neural network，最简单的循环神经网络（确切来说是基于字符的），输入输出采用one-hot，场景：生成单词。包括，梯度裁剪、字符采样。关于bp推导，详见本人CSDN博客：https://blog.csdn.net/u012328159/article/details/84962285\n\n<br>\n<br>\n--------\n动态更新.................\n"
  },
  {
    "path": "batch_normalization.py",
    "content": "# implement the batch normalization\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom sklearn.datasets import  load_breast_cancer\nfrom sklearn.model_selection import train_test_split\n\n\n#initialize parameters(w,b)\ndef initialize_parameters(layer_dims):\n\t\"\"\"\n\t:param layer_dims: list,每一层单元的个数（维度）\n\t\t\tgamma -- scale vector of shape (size of current layer ,1)\n            beta -- offset vector of shape (size of current layer ,1)\n\t:return: parameter: directory store w1,w2,...,wL,b1,...,bL\n\t\t\t bn_param: directory store moving_mean, moving_var\n\t\"\"\"\n\tnp.random.seed(3)\n\tL = len(layer_dims)#the number of layers in the network\n\tparameters = {}\n\t# initialize the exponential weight average\n\tbn_param = {}\n\tfor l in range(1,L):\n\t\tparameters[\"W\" + str(l)] = np.random.randn(layer_dims[l],layer_dims[l-1])*0.01\n\t\tparameters[\"b\" + str(l)] = np.zeros((layer_dims[l],1))\n\t\tparameters[\"gamma\" + str(l)] = np.ones((layer_dims[l],1))\n\t\tparameters[\"beta\" + str(l)] = np.zeros((layer_dims[l],1))\n\t\tbn_param[\"moving_mean\" + str(l)] = np.zeros((layer_dims[l], 1))\n\t\tbn_param[\"moving_var\" + str(l)] = np.zeros((layer_dims[l], 1))\n\n\treturn parameters, bn_param\n\ndef relu_forward(Z):\n\t\"\"\"\n\t:param Z: Output of the linear layer\n\t:return:\n\tA: output of activation\n\t\"\"\"\n\tA = np.maximum(0,Z)\n\treturn A\n\n#implement the activation function(ReLU and sigmoid)\ndef sigmoid_forward(Z):\n\t\"\"\"\n\t:param Z: Output of the linear layer\n\t:return:\n\t\"\"\"\n\tA = 1 / (1 + np.exp(-Z))\n\treturn A\n\ndef linear_forward(X, W, b):\n\tz = np.dot(W, X) + b\n\treturn z\n\ndef batchnorm_forward(z, gamma, beta, epsilon = 1e-12):\n\t\"\"\"\n\t:param z: the input of activation (z = np.dot(W,A_pre) + b)\n\t:param epsilon: is a constant for denominator is 0\n\t:return: z_out, mean, variance\n\t\"\"\"\n\tmu = np.mean(z, axis=1, keepdims=True)#axis=1按行求均值\n\tvar = np.var(z, axis=1, keepdims=True)\n\tsqrt_var = np.sqrt(var + epsilon)\n\tz_norm = (z - mu) / sqrt_var\n\tz_out = np.multiply(gamma,z_norm) + beta #对应元素点乘\n\treturn z_out, mu, var, z_norm, sqrt_var\n\n\ndef forward_propagation(X, parameters, bn_param, decay = 0.9):\n\t\"\"\"\n\tX -- input dataset, of shape (input size, number of examples)\n    parameters -- python dictionary containing your parameters \"W1\", \"b1\", \"gamma1\",\"beta1\",W2\", \"b2\",\"gamma2\",\"beta2\",...,\"WL\", \"bL\",\"gammaL\",\"betaL\"\n                    W -- weight matrix of shape (size of current layer, size of previous layer)\n                    b -- bias vector of shape (size of current layer,1)\n                    gamma -- scale vector of shape (size of current layer ,1)\n                    beta -- offset vector of shape (size of current layer ,1)\n                    decay -- the parameter of exponential weight average\n                    moving_mean = decay * moving_mean + (1 - decay) * current_mean\n                    moving_var = decay * moving_var + (1 - decay) * moving_var\n                    the moving_mean and moving_var are used for test\n    :return:\n\tAL: the output of the last Layer(y_predict)\n\tcaches: list, every element is a tuple:(A, W,b,gamma,sqrt_var,z_out,Z_norm)\n\t\"\"\"\n\tL = len(parameters) // 4  # number of layer\n\tA = X\n\tcaches = []\n\t# calculate from 1 to L-1 layer\n\tfor l in range(1,L):\n\t\tW = parameters[\"W\" + str(l)]\n\t\tb = parameters[\"b\" + str(l)]\n\t\tgamma = parameters[\"gamma\" + str(l)]\n\t\tbeta = parameters[\"beta\" + str(l)]\n\t\tz = linear_forward(A, W, b)\n\t\tz_out, mu, var, z_norm, sqrt_var = batchnorm_forward(z, gamma, beta) #batch normalization\n\t\tcaches.append((A, W, b, gamma, sqrt_var, z_out, z_norm)) #以激活单元为分界线，把做激活前的变量放在一起，激活后可以认为是下一层的x了\n\t\tA = relu_forward(z_out) #relu activation function\n\t\t#exponential weight average for test\n\t\tbn_param[\"moving_mean\" + str(l)] = decay * bn_param[\"moving_mean\" + str(l)] + (1 - decay) * mu\n\t\tbn_param[\"moving_var\" + str(l)] = decay * bn_param[\"moving_var\" + str(l)] + (1 - decay) * var\n\t# calculate Lth layer(last layer)\n\tWL = parameters[\"W\" + str(L)]\n\tbL = parameters[\"b\" + str(L)]\n\tzL = linear_forward(A, WL, bL)\n\tcaches.append((A, WL, bL, None, None, None, None))\n\tAL = sigmoid_forward(zL)\n\treturn AL, caches, bn_param\n\n#calculate cost function\ndef compute_cost(AL,Y):\n\t\"\"\"\n\t:param AL: 最后一层的激活值，即预测值，shape:(1,number of examples)\n\t:param Y:真实值,shape:(1, number of examples)\n\t:return:\n\t\"\"\"\n\tm = Y.shape[1]\n\t# cost = -1.0/m * np.sum(Y*np.log(AL)+(1-Y)*np.log(1.0 - AL))#py中*是点乘\n\t# cost = (1. / m) * (-np.dot(Y, np.log(AL).T) - np.dot(1 - Y, np.log(1 - AL).T)) #推荐用这个，上面那个容易出错\n\tcost = 1. / m * np.nansum(np.multiply(-np.log(AL), Y) +\n\t                          np.multiply(-np.log(1 - AL), 1 - Y))\n\t#从数组的形状中删除单维条目，即把shape中为1的维度去掉，比如把[[[2]]]变成2\n\tcost = np.squeeze(cost)\n\t# print('=====================cost===================')\n\t# print(cost)\n\treturn cost\n\n#derivation of relu\ndef relu_backward(dA, Z):\n\t\"\"\"\n\t:param Z: the input of activation function\n\t:return:\n\t\"\"\"\n\tdout = np.multiply(dA, np.int64(Z > 0))\n\treturn dout\n\ndef batchnorm_backward(dout, cache):\n\t\"\"\"\n\t:param dout: Upstream derivatives\n\t:param cache:\n\t:return:\n\t\"\"\"\n\t_, _, _, gamma, sqrt_var, _, Z_norm = cache\n\tm = dout.shape[1]\n\tdgamma = np.sum(dout*Z_norm, axis=1, keepdims=True) #*作用于矩阵时为点乘\n\tdbeta = np.sum(dout, axis=1, keepdims=True)\n\tdy = 1./m * gamma * sqrt_var * (m * dout - np.sum(dout, axis=1, keepdims=True) - Z_norm*np.sum(dout*Z_norm, axis=1, keepdims=True))\n\treturn dgamma, dbeta, dy\n\ndef linear_backward(dZ, cache):\n\t\"\"\"\n\t:param dZ: Upstream derivative, the shape (n^[l+1],m)\n\t:param A: input of this layer\n\t:return:\n\t\"\"\"\n\tA, W, _, _, _, _, _ = cache\n\tdW = np.dot(dZ, A.T)\n\tdb = np.sum(dZ, axis=1, keepdims=True)\n\tda = np.dot(W.T, dZ)\n\treturn da, dW, db\n\ndef backward_propagation(AL, Y, caches):\n\t\"\"\"\n\tImplement the backward propagation presented in figure 2.\n\tArguments:\n\tY -- true \"label\" vector (containing 0 if cat, 1 if non-cat)\n\tcaches -- caches output from forward_propagation(),(w,b,gamma,sqrt_var,z_out,Z_norm,A)\n\n\tReturns:\n\tgradients -- A dictionary with the gradients with respect to dW,db\n\t\"\"\"\n\tm = Y.shape[1]\n\tL = len(caches)-1\n\t# print(\"L:   \" + str(L))\n\t#calculate the Lth layer gradients\n\tdz = 1./m * (AL - Y)\n\tda, dWL, dbL = linear_backward(dz, caches[L])\n\tgradients = {\"dW\"+str(L+1): dWL, \"db\"+str(L+1): dbL}\n\t#calculate from L-1 to 1 layer gradients\n\tfor l in reversed(range(0,L)): # L-1,L-3,....,1\n\t\t#relu_backward->batchnorm_backward->linear backward\n\t\tA, w, b, gamma, sqrt_var, z_out, z_norm = caches[l]\n\t\t#relu backward\n\t\tdout = relu_backward(da,z_out)\n\t\t#batch normalization\n\t\tdgamma, dbeta, dz = batchnorm_backward(dout,caches[l])\n\t\t# print(\"===============dz\" + str(l+1) + \"===================\")\n\t\t# print(dz.shape)\n\t\t#linear backward\n\t\tda, dW, db = linear_backward(dz,caches[l])\n\t\t# print(\"===============dw\"+ str(l+1) +\"=============\")\n\t\t# print(dW.shape)\n\t\t#gradient\n\t\tgradients[\"dW\" + str(l+1)] = dW\n\t\tgradients[\"db\" + str(l+1)] = db\n\t\tgradients[\"dgamma\" + str(l+1)] = dgamma\n\t\tgradients[\"dbeta\" + str(l+1)] = dbeta\n\treturn gradients\n\ndef update_parameters(parameters, grads, learning_rate):\n\t\"\"\"\n\t:param parameters: dictionary, W, b\n\t:param grads: dW,db,dgamma,dbeta\n\t:param learning_rate: alpha\n\t:return:\n\t\"\"\"\n\tL = len(parameters) // 4\n\tfor l in range(L):\n\t\tparameters[\"W\" + str(l + 1)] = parameters[\"W\" + str(l + 1)] - learning_rate * grads[\"dW\" + str(l+1)]\n\t\tparameters[\"b\" + str(l + 1)] = parameters[\"b\" + str(l + 1)] - learning_rate * grads[\"db\" + str(l+1)]\n\t\tif l < L-1:\n\t\t\tparameters[\"gamma\" + str(l + 1)] = parameters[\"gamma\" + str(l + 1)] - learning_rate * grads[\"dgamma\" + str(l + 1)]\n\t\t\tparameters[\"beta\" + str(l + 1)] = parameters[\"beta\" + str(l + 1)] - learning_rate * grads[\"dbeta\" + str(l + 1)]\n\treturn parameters\n\ndef random_mini_batches(X, Y, mini_batch_size = 64, seed=1):\n\t\"\"\"\n\tCreates a list of random minibatches from (X, Y)\n\tArguments:\n\tX -- input data, of shape (input size, number of examples)\n\tY -- true \"label\" vector (1 for blue dot / 0 for red dot), of shape (1, number of examples)\n\tmini_batch_size -- size of the mini-batches, integer\n\n\tReturns:\n\tmini_batches -- list of synchronous (mini_batch_X, mini_batch_Y)\n\t\"\"\"\n\tnp.random.seed(seed)\n\tm = X.shape[1]  # number of training examples\n\tmini_batches = []\n\n\t# Step 1: Shuffle (X, Y)\n\tpermutation = list(np.random.permutation(m))\n\tshuffled_X = X[:, permutation]\n\tshuffled_Y = Y[:, permutation].reshape((1, m))\n\n\t# Step 2: Partition (shuffled_X, shuffled_Y). Minus the end case.\n\tnum_complete_minibatches = m // mini_batch_size  # number of mini batches of size mini_batch_size in your partitionning\n\tfor k in range(0, num_complete_minibatches):\n\t\tmini_batch_X = shuffled_X[:, k * mini_batch_size: (k + 1) * mini_batch_size]\n\t\tmini_batch_Y = shuffled_Y[:, k * mini_batch_size: (k + 1) * mini_batch_size]\n\t\tmini_batch = (mini_batch_X, mini_batch_Y)\n\t\tmini_batches.append(mini_batch)\n\n\t# Handling the end case (last mini-batch < mini_batch_size)\n\tif m % mini_batch_size != 0:\n\t\tmini_batch_X = shuffled_X[:, num_complete_minibatches * mini_batch_size: m]\n\t\tmini_batch_Y = shuffled_Y[:, num_complete_minibatches * mini_batch_size: m]\n\t\tmini_batch = (mini_batch_X, mini_batch_Y)\n\t\tmini_batches.append(mini_batch)\n\n\treturn mini_batches\n\n\ndef L_layer_model(X, Y, layer_dims, learning_rate, num_iterations, mini_batch_size = 64):\n\t\"\"\"\n\t:param X:\n\t:param Y:\n\t:param layer_dims: list containing the input size and each layer size\n\t:param learning_rate:\n\t:param num_iterations:\n\t:return:\n\tparameters：final parameters:(W,b,gamma,beta)\n\tbn_param: moving_mean, moving_var\n\t\"\"\"\n\tcosts = []\n\t# initialize parameters\n\tparameters, bn_param = initialize_parameters(layer_dims)\n\tseed = 0\n\tfor i in range(0, num_iterations):\n\t\tseed = seed + 1\n\t\tminibatches = random_mini_batches(X, Y, mini_batch_size, seed)\n\t\tfor minibatch in minibatches:\n\t\t\t# Select a minibatch\n\t\t\t(minibatch_X, minibatch_Y) = minibatch\n\t\t\t#foward propagation\n\t\t\tAL,caches,bn_param = forward_propagation(minibatch_X, parameters,bn_param)\n\t\t\t# calculate the cost\n\t\t\tcost = compute_cost(AL, minibatch_Y)\n\t\t\t#backward propagation\n\t\t\tgrads = backward_propagation(AL, minibatch_Y, caches)\n\t\t\t#update parameters\n\t\t\tparameters = update_parameters(parameters, grads, learning_rate)\n\t\tif i % 200 == 0:\n\t\t\tprint(\"Cost after iteration {}: {}\".format(i, cost))\n\t\t\tcosts.append(cost)\n\tprint('length of cost')\n\tprint(len(costs))\n\tplt.clf()\n\tplt.plot(costs)  # o-:圆形\n\tplt.xlabel(\"iterations(thousand)\")  # 横坐标名字\n\tplt.ylabel(\"cost\")  # 纵坐标名字\n\tplt.show()\n\treturn parameters,bn_param\n\n#fp for test\ndef forward_propagation_for_test(X, parameters, bn_param, epsilon = 1e-12):\n\t\"\"\"\n\tX -- input dataset, of shape (input size, number of examples)\n    parameters -- python dictionary containing your parameters \"W1\", \"b1\", \"gamma1\",\"beta1\",W2\", \"b2\",\"gamma2\",\"beta2\",...,\"WL\", \"bL\",\"gammaL\",\"betaL\"\n                    W -- weight matrix of shape (size of current layer, size of previous layer)\n                    b -- bias vector of shape (size of current layer,1)\n                    gamma -- scale vector of shape (size of current layer ,1)\n                    beta -- offset vector of shape (size of current layer ,1)\n                    decay -- the parameter of exponential weight average\n                    moving_mean = decay * moving_mean + (1 - decay) * current_mean\n                    moving_var = decay * moving_var + (1 - decay) * moving_var\n                    the moving_mean and moving_var are used for test\n    :return:\n\tAL: the output of the last Layer(y_predict)\n\tcaches: list, every element is a tuple:(A, W,b,gamma,sqrt_var,z,Z_norm)\n\t\"\"\"\n\tL = len(parameters) // 4  # number of layer\n\tA = X\n\t# calculate from 1 to L-1 layer\n\tfor l in range(1,L):\n\t\tW = parameters[\"W\" + str(l)]\n\t\tb = parameters[\"b\" + str(l)]\n\t\tgamma = parameters[\"gamma\" + str(l)]\n\t\tbeta = parameters[\"beta\" + str(l)]\n\t\tz = linear_forward(A, W, b)\n\t\t#batch normalization\n\t\t# exponential weight average\n\t\tmoving_mean = bn_param[\"moving_mean\" + str(l)]\n\t\tmoving_var = bn_param[\"moving_var\" + str(l)]\n\t\tsqrt_var = np.sqrt(moving_var + epsilon)\n\t\tz_norm = (z - moving_mean) / sqrt_var\n\t\tz_out = np.multiply(gamma, z_norm) + beta  # 对应元素点乘\n\t\t#relu forward\n\t\tA = relu_forward(z_out) #relu activation function\n\n\t# calculate Lth layer(last layer)\n\tWL = parameters[\"W\" + str(L)]\n\tbL = parameters[\"b\" + str(L)]\n\tzL = linear_forward(A, WL, bL)\n\tAL = sigmoid_forward(zL)\n\treturn AL\n\n\n\n#predict function\ndef predict(X_test, y_test, parameters, bn_param):\n\t\"\"\"\n\t:param X:\n\t:param y:\n\t:param parameters:\n\t:return:\n\t\"\"\"\n\tm = y_test.shape[1]\n\tY_prediction = np.zeros((1, m))\n\tprob = forward_propagation_for_test(X_test, parameters, bn_param)\n\tfor i in range(prob.shape[1]):\n\t\t# Convert probabilities A[0,i] to actual predictions p[0,i]\n\t\tif prob[0, i] > 0.5:\n\t\t\tY_prediction[0, i] = 1\n\t\telse:\n\t\t\tY_prediction[0, i] = 0\n\taccuracy = 1- np.mean(np.abs(Y_prediction - y_test))\n\treturn accuracy\n\n#DNN model\ndef DNN(X_train, y_train, X_test, y_test, layer_dims, learning_rate= 0.01, num_iterations=10000, mini_batch_size=64):\n\tparameters, bn_param = L_layer_model(X_train, y_train, layer_dims, learning_rate, num_iterations, mini_batch_size)\n\ttrain_accuracy = predict(X_train, y_train, parameters, bn_param)\n\ttest_accuracy = predict(X_test,y_test,parameters,bn_param)\n\treturn train_accuracy, test_accuracy\n\n\nif __name__ == \"__main__\":\n\tX_data, y_data = load_breast_cancer(return_X_y=True)\n\tX_train, X_test,y_train,y_test = train_test_split(X_data, y_data, train_size=0.8,test_size=0.2,random_state=28)\n\tX_train = X_train.T\n\ty_train = y_train.reshape(y_train.shape[0], -1).T\n\tX_test = X_test.T\n\ty_test = y_test.reshape(y_test.shape[0], -1).T\n\ttrain_accuracy, test_accuracy = DNN(X_train,y_train,X_test,y_test,[X_train.shape[0],10,5,1], mini_batch_size = 256)\n\tprint('train accuracy: ', train_accuracy)\n\tprint('test accuracy: ', test_accuracy)"
  },
  {
    "path": "compare_initializations.py",
    "content": "\n#对比几种初始化方法\nimport numpy as np\nimport matplotlib.pyplot as plt\n\n#初始化为0\ndef initialize_parameters_zeros(layers_dims):\n\t\"\"\"\n\tArguments:\n\tlayer_dims -- python array (list) containing the size of each layer.\n\tReturns:\n\tparameters -- python dictionary containing your parameters \"W1\", \"b1\", ..., \"WL\", \"bL\":\n\t\t\t\t\tW1 -- weight matrix of shape (layers_dims[1], layers_dims[0])\n\t\t\t\t\tb1 -- bias vector of shape (layers_dims[1], 1)\n\t\t\t\t\t...\n\t\t\t\t\tWL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])\n\t\t\t\t\tbL -- bias vector of shape (layers_dims[L], 1)\n\t\"\"\"\n\tparameters = {}\n\tL = len(layers_dims)  # number of layers in the network\n\n\tfor l in range(1, L):\n\t\tparameters['W' + str(l)] = np.zeros((layers_dims[l], layers_dims[l - 1]))\n\t\tparameters['b' + str(l)] = np.zeros((layers_dims[l], 1))\n\treturn parameters\n\n#随机初始化\ndef initialize_parameters_random(layers_dims):\n\t\"\"\"\n\tArguments:\n\tlayer_dims -- python array (list) containing the size of each layer.\n\n\tReturns:\n\tparameters -- python dictionary containing your parameters \"W1\", \"b1\", ..., \"WL\", \"bL\":\n\t\t\t\t\tW1 -- weight matrix of shape (layers_dims[1], layers_dims[0])\n\t\t\t\t\tb1 -- bias vector of shape (layers_dims[1], 1)\n\t\t\t\t\t...\n\t\t\t\t\tWL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])\n\t\t\t\t\tbL -- bias vector of shape (layers_dims[L], 1)\n\t\"\"\"\n\tnp.random.seed(3)  # This seed makes sure your \"random\" numbers will be the as ours\n\tparameters = {}\n\tL = len(layers_dims)  # integer representing the number of layers\n\tfor l in range(1, L):\n\t\tparameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l - 1])*0.01\n\t\tparameters['b' + str(l)] = np.zeros((layers_dims[l], 1))\n\treturn parameters\n\n#xavier initialization\ndef initialize_parameters_xavier(layers_dims):\n\t\"\"\"\n\tArguments:\n\tlayer_dims -- python array (list) containing the size of each layer.\n\n\tReturns:\n\tparameters -- python dictionary containing your parameters \"W1\", \"b1\", ..., \"WL\", \"bL\":\n\t\t\t\t\tW1 -- weight matrix of shape (layers_dims[1], layers_dims[0])\n\t\t\t\t\tb1 -- bias vector of shape (layers_dims[1], 1)\n\t\t\t\t\t...\n\t\t\t\t\tWL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])\n\t\t\t\t\tbL -- bias vector of shape (layers_dims[L], 1)\n\t\"\"\"\n\tnp.random.seed(3)\n\tparameters = {}\n\tL = len(layers_dims)  # integer representing the number of layers\n\tfor l in range(1, L):\n\t\tparameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l - 1]) * np.sqrt(1 / layers_dims[l - 1])\n\t\tparameters['b' + str(l)] = np.zeros((layers_dims[l], 1))\n\treturn parameters\n\n#He initialization\ndef initialize_parameters_he(layers_dims):\n\t\"\"\"\n\tArguments:\n\tlayer_dims -- python array (list) containing the size of each layer.\n\n\tReturns:\n\tparameters -- python dictionary containing your parameters \"W1\", \"b1\", ..., \"WL\", \"bL\":\n\t\t\t\t\tW1 -- weight matrix of shape (layers_dims[1], layers_dims[0])\n\t\t\t\t\tb1 -- bias vector of shape (layers_dims[1], 1)\n\t\t\t\t\t...\n\t\t\t\t\tWL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])\n\t\t\t\t\tbL -- bias vector of shape (layers_dims[L], 1)\n\t\"\"\"\n\tnp.random.seed(3)\n\tparameters = {}\n\tL = len(layers_dims)  # integer representing the number of layers\n\n\tfor l in range(1, L):\n\t\tparameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l - 1]) * np.sqrt(2 / layers_dims[l - 1])\n\t\tparameters['b' + str(l)] = np.zeros((layers_dims[l], 1))\n\treturn parameters\n\ndef relu(Z):\n\t\"\"\"\n\t:param Z: Output of the linear layer\n\t:return:\n\tA: output of activation\n\t\"\"\"\n\tA = np.maximum(0,Z)\n\treturn A\n\n\ndef initialize_parameters(layer_dims):\n\t\"\"\"\n\t:param layer_dims: list,每一层单元的个数（维度）\n\t:return:dictionary,存储参数w1,w2,...,wL,b1,...,bL\n\t\"\"\"\n\tnp.random.seed(3)\n\tL = len(layer_dims)#the number of layers in the network\n\tparameters = {}\n\tfor l in range(1,L):\n\t\tparameters[\"W\" + str(l)] = np.random.randn(layer_dims[l],layer_dims[l-1])*np.sqrt(2 / layer_dims[l - 1])\n\t\tparameters[\"b\" + str(l)] = np.zeros((layer_dims[l],1))\n\treturn parameters\n\ndef forward_propagation(initialization=\"he\"):\n\tdata = np.random.randn(1000, 100000)\n\tlayers_dims = [1000,800,500,300,200,100,10]\n\tnum_layers = len(layers_dims)\n\t# Initialize parameters dictionary.\n\tif initialization == \"zeros\":\n\t\tparameters = initialize_parameters_zeros(layers_dims)\n\telif initialization == \"random\":\n\t\tparameters = initialize_parameters_random(layers_dims)\n\telif initialization == \"xavier\":\n\t\tparameters = initialize_parameters_xavier(layers_dims)\n\telif initialization == \"he\":\n\t\tparameters = initialize_parameters_he(layers_dims)\n\tA = data\n\tfor l in range(1,num_layers):\n\t\tA_pre = A\n\t\tW = parameters[\"W\" + str(l)]\n\t\tb = parameters[\"b\" + str(l)]\n\t\tz = np.dot(W,A_pre) + b #计算z = wx + b\n\t\t# A = np.tanh(z) #relu activation function\n\t\tA = relu(z)\n\t\tplt.subplot(2,3,l)\n\t\tplt.hist(A.flatten(),facecolor='g')\n\t\tplt.xlim([-1,1])\n\t\tplt.yticks([])\n\tplt.show()\n\nif __name__ == '__main__':\n\tforward_propagation()"
  },
  {
    "path": "deep_neural_network_ng.py",
    "content": "import numpy as np\nfrom machine_learning.deep_neural_network.init_utils import load_dataset\nimport matplotlib.pyplot as plt\n\n#initialize parameters(w,b)\ndef initialize_parameters(layer_dims):\n\t\"\"\"\n\t:param layer_dims: list,每一层单元的个数（维度）\n\t:return:dictionary,存储参数w1,w2,...,wL,b1,...,bL\n\t\"\"\"\n\tL = len(layer_dims)#the number of layers in the network\n\tparameters = {}\n\tnp.random.seed(3)\n\tfor l in range(1,L):\n\t\tparameters[\"W\" + str(l)] = np.random.randn(layer_dims[l],layer_dims[l-1])\n\t\tparameters[\"b\" + str(l)] = np.zeros((layer_dims[l],1))\n\treturn parameters\n#Implement the linear part of a layer's forward propagation: z = w[l] * a[l-1] + b[l]\ndef linear_forward(A_pre,W,b):\n\t\"\"\"\n\t:param A_pre:上一层的激活值,shape:(size of previous layer,m)\n\t:param W: weight matrix,shape:(size of current layer,size of previous layer)\n\t:param b: bias vector,shape:(size of current layer,1)\n\t:return:\n\tZ：激活函数的输入值（线性相加和）\n\tcache：因为bp的时候要用到w，b，a所以要把每一层的都存起来，方便后面用\n\t\"\"\"\n\tZ = np.dot(W,A_pre) + b\n\tcache = (A_pre,W,b)\n\treturn Z,cache\n#implement the activation function(ReLU and sigmoid)\ndef relu(Z):\n\t\"\"\"\n\t:param Z: Output of the linear layer\n\t:return:\n\tA: output of activation\n\tactivation_cache: 要把Z保存起来，因为后面bp，对relu求导，求dz的时候要用到\n\t\"\"\"\n\tA = np.maximum(0,Z)\n\tactivation_cache = Z #要把Z保存起来，因为后面bp，对relu求导，求dz的时候要用到\n\treturn A, activation_cache\n#implement the activation function(ReLU and sigmoid)\ndef sigmoid(Z):\n\t\"\"\"\n\t:param Z: Output of the linear layer\n\t:return:\n\t\"\"\"\n\tA = 1 / (1 + np.exp(-Z))\n\tactivation_cache = Z\n\treturn A,activation_cache\n#calculate the output of the activation\ndef linear_activation_forward(A_pre,W,b,activation):\n\t\"\"\"\n\t:param A_pre: activations from previous layer,shape(size of previous layer, number of examples)\n\t:param W:weights matrix,shape(size of current layer, size of previous layer)\n\t:param b:bias vector, shape(size of the current layer, 1)\n\t:param activation:the activation to be used in this layer(ReLu or sigmoid)\n\t:return:\n\tA: the output of the activation function\n\tcache: tuple,形式为:((A_pre,W,b),Z),后面bp要用到的((A_pre,W,b),Z)\n\t\"\"\"\n\tif activation == \"sigmoid\":\n\t\tZ, linear_cache = linear_forward(A_pre,W,b)#linear_cache:(A_pre,W,b)\n\t\tA, activation_cache = sigmoid(Z)# activation_cache: Z\n\telif activation == \"relu\":\n\t\tZ, linear_cache = linear_forward(A_pre, W, b)#linear_cache:(A_pre,W,b)\n\t\tA, activation_cache = relu(Z)# activation_cache: Z\n\tcache = (linear_cache, activation_cache)\n\treturn A, cache\n# Implement the forward propagation of the L-layer model\ndef L_model_forward(X,parameters):\n\t\"\"\"\n\t:param X: data set,input matrix,shape(feature dimensions,number of example)\n\t:param parameters: W,b\n\t:return:\n\tAL: activation of Lth layer i.e. y_hat(y_predict)\n\tcaches: list,存储每一层的linear_cache(A_pre,W,b),activation_cache(Z)\n\t\"\"\"\n\tcaches = []#用于存储每一层的，A_pre,W,b,Z\n\tA = X\n\tL = len(parameters) // 2 # number of layer\n\t#calculate from 1 to L-1 layer activation\n\tfor l in range(1,L):\n\t\tA_pre = A\n\t\tW = parameters[\"W\" + str(l)]\n\t\tb = parameters[\"b\" + str(l)]\n\t\tA, cache = linear_activation_forward(A_pre, W, b, \"relu\")\n\t\tcaches.append(cache)\n\t#calculate Lth layer activation\n\tAL, cache = linear_activation_forward(A,parameters[\"W\" + str(L)],parameters[\"b\" + str(L)],\"sigmoid\")\n\tcaches.append(cache)\n\t# print(\"W1: \" + str(caches[0][0][1].shape))\n\t# print(caches[0][0][1])\n\t# print(\"b1: \" + str(caches[0][0][2].shape))\n\t# print(caches[0][0][2])\n\t# print(\"Z1: \" + str(caches[0][1].shape))\n\t# print(caches[0][1])\n\t# print(\"A1: \" + str(caches[1][0][0].shape))\n\t# print(caches[1][0][0])\n\t# print('==========================')\n\t# print(\"W2: \" + str(caches[1][0][1].shape))\n\t# print(caches[1][0][1])\n\t# print(\"b2: \" + str(caches[1][0][2].shape))\n\t# print(caches[1][0][2])\n\t# print(\"Z2: \" + str(caches[1][1].shape))\n\t# print(caches[1][1])\n\t# print(\"A2: \" + str(caches[2][0][0].shape))\n\t# print(caches[2][0][0])\n\t# print('==========================')\n\t# print(\"W3: \" + str(caches[2][0][1].shape))\n\t# print(caches[2][0][1])\n\t# print(\"b3: \" + str(caches[2][0][2].shape))\n\t# print(caches[2][0][2])\n\t# print(\"Z3: \" + str(caches[2][1].shape))\n\t# print(caches[2][1])\n\t# print(\"A3: \" + str(AL.shape))\n\t# print(AL)\n\treturn AL,caches\n#calculate cost function\ndef compute_cost(AL,Y):\n\t\"\"\"\n\t:param AL: 最后一层的激活值，即预测值，shape:(1,number of examples)\n\t:param Y:真实值,shape:(1, number of examples)\n\t:return:\n\t\"\"\"\n\tm = Y.shape[1]\n\tcost = 1. / m * np.nansum(np.multiply(-np.log(AL), Y) + np.multiply(-np.log(1 - AL), 1 - Y))\n\t#从数组的形状中删除单维条目，即把shape中为1的维度去掉，比如把[[[2]]]变成2\n\tcost = np.squeeze(cost)\n\treturn cost\n\ndef sigmoid_backward(dA, Z):\n\t\"\"\"\n\t:param dA:\n\t:param Z:\n\t:return:\n\t\"\"\"\n\ta = 1/(1 + np.exp(-Z))\n\tdZ = dA * a*(1-a)\n\treturn dZ\n\n\ndef relu_backward(dA, cache):\n\t\"\"\"\n\tImplement the backward propagation for a single RELU unit.\n\n\tArguments:\n\tdA -- post-activation gradient, of any shape\n\tcache -- 'Z' where we store for computing backward propagation efficiently\n\n\tReturns:\n\tdZ -- Gradient of the cost with respect to Z\n\t\"\"\"\n\tZ = cache\n\tdZ = dA * np.int64(Z > 0) #\n\treturn dZ\n\t\n#calculate dA_pre,dW,db\ndef linear_backward(dZ, cache):\n\t\"\"\"\n\t:param dZ:\n\t:param cache: 前面fp保存的linear_cache(A_pre,W,b)\n\t:return:\n\t\"\"\"\n\tA_prev, W, b = cache\n\tm = A_prev.shape[1]\n\tdW = 1/m * np.dot(dZ,A_prev.T)#有时候不敢确定是线代乘还是点乘，有个小trick就是dW维度一定和W保持一致，这样就好确定是np.dot()还是*了\n\tdb = 1/m * np.sum(dZ,axis=1,keepdims=True)\n\tdA_pre = np.dot(W.T,dZ)\n\treturn dA_pre,dW,db\n\ndef linear_activation_backward(dA, cache, activation):\n\t\"\"\"\n\t:param dA:\n\t:param cache:\n\t:param activation:\n\t:return:\n\t\"\"\"\n\tlinear_cache, activation_cache = cache #((A_pre,W,b),Z)\n\tif activation == \"relu\":\n\t\tdZ = relu_backward(dA, activation_cache)\n\t\tdA_pre, dW, db = linear_backward(dZ,linear_cache)\n\telif activation == \"sigmoid\":\n\t\tdZ = sigmoid_backward(dA, activation_cache)\n\t\tdA_pre, dW, db = linear_backward(dZ, linear_cache)\n\treturn dA_pre, dW, db\n\n# Implement the backward propagation of the L-layer model\ndef L_model_backward(AL, Y, caches):\n\t\"\"\"\n\t:param AL: 最后一层激活值（i.e y_hat）\n\t:param Y: 实际类别(0,1)\n\t:param caches: fp时各层的((A_pre,W,b),Z)\n\t:return:\n\t\"\"\"\n\tgrads = {}#存放各层的dW，db\n\tL = len(caches)\n\t# print('L:  ' + str(L))\n\t# 这个地方之所以没有1/m，是因为对Z,A等中间变量求导时，直接使用的是交叉熵函数对Z,A求导，\n\t# 而不是cost function，只有对W，b求导时使用cost function\n\t#第L层单独算,因为激活函数是sigmoid\n\tdAL = -(np.divide(Y,AL) - np.divide((1-Y),(1-AL)))\n\t# print(\"dAL: \")\n\t# print(dAL)\n\tcurrent_cache = caches[L - 1]\n\tgrads[\"dA\" + str(L - 1)], grads[\"dW\" + str(L)], grads[\"db\" + str(L)] \\\n\t\t= linear_activation_backward(dAL,current_cache,\"sigmoid\")\n\tfor l in reversed(range(L-1)):\n\t\tcurrent_cache = caches[l]\n\t\tdA_pre_temp, dW_temp, db_temp \\\n\t\t\t= linear_activation_backward(grads[\"dA\" + str(l + 1)],current_cache,\"relu\")\n\t\tgrads[\"dA\" + str(l)] = dA_pre_temp\n\t\tgrads[\"dW\" + str(l+1)] = dW_temp\n\t\tgrads[\"db\" + str(l+1)] = db_temp\n\t# print(\"******************************梯度*************************\")\n\t# print(grads)\n\treturn grads\n# update w,b\ndef update_parameters(parameters, grads, learning_rate):\n\t\"\"\"\n\t:param parameters: dictionary,  W,b\n\t:param grads: dW,db\n\t:param learning_rate: alpha\n\t:return:\n\t\"\"\"\n\tL = len(parameters) // 2\n\tfor l in range(L):\n\t\tparameters[\"W\" + str(l + 1)] = parameters[\"W\" + str(l + 1)] - learning_rate * grads[\"dW\" + str(l+1)]\n\t\tparameters[\"b\" + str(l + 1)] = parameters[\"b\" + str(l + 1)] - learning_rate * grads[\"db\" + str(l+1)]\n\t# print(parameters)\n\treturn parameters\n\ndef L_layer_model(X, Y, layer_dims, learning_rate, num_iterations):\n\t\"\"\"\n\t:param X:\n\t:param Y:\n\t:param layer_dims:list containing the input size and each layer size\n\t:param learning_rate:\n\t:param num_iterations:\n\t:return:\n\tparameters：final parameters:(W,b)\n\t\"\"\"\n\tcosts = []\n\t# initialize parameters\n\tparameters = initialize_parameters(layer_dims)\n\tfor i in range(0, num_iterations):\n\t\t#foward propagation\n\t\tAL,caches = L_model_forward(X, parameters)\n\t\t# calculate the cost\n\t\tcost = compute_cost(AL, Y)\n\t\tif i % 1000 == 0:\n\t\t\tprint(\"Cost after iteration {}: {}\".format(i, cost))\n\t\t\tcosts.append(cost)\n\t\t#backward propagation\n\t\tgrads = L_model_backward(AL, Y, caches)\n\t\t#update parameters\n\t\tparameters = update_parameters(parameters, grads, learning_rate)\n\tplt.clf()\n\tplt.plot(costs)  # o-:圆形\n\tplt.xlabel(\"iterations\")  # 横坐标名字\n\tplt.ylabel(\"cost\")  # 纵坐标名字\n\tplt.show()\n\treturn parameters\n\n#predict function\ndef predict(X,y,parameters):\n\t\"\"\"\n\t:param X:\n\t:param y:\n\t:param parameters:\n\t:return:\n\t\"\"\"\n\tm = y.shape[1]\n\tY_prediction = np.zeros((1, m))\n\tprob, caches = L_model_forward(X,parameters)\n\tfor i in range(prob.shape[1]):\n\t\t# Convert probabilities A[0,i] to actual predictions p[0,i]\n\t\t### START CODE HERE ### (≈ 4 lines of code)\n\t\tif prob[0, i] > 0.5:\n\t\t\tY_prediction[0, i] = 1\n\t\telse:\n\t\t\tY_prediction[0, i] = 0\n\taccuracy = 1- np.mean(np.abs(Y_prediction - y))\n\treturn accuracy\n#DNN model\ndef DNN(X_train, y_train, X_test, y_test, layer_dims, learning_rate= 0.01, num_iterations=15000):\n\tparameters = L_layer_model(X_train, y_train, layer_dims, learning_rate, num_iterations)\n\tprint('===========================测试集=================================')\n\taccuracy = predict(X_test,y_test,parameters)\n\treturn accuracy\n\nif __name__ == \"__main__\":\n\tX_train, y_train, X_test, y_test = load_dataset()\n\taccuracy = DNN(X_train, y_train, X_test, y_test, [X_train.shape[0], 10, 5, 1])\n\tprint(accuracy)"
  },
  {
    "path": "deep_neural_network_release.py",
    "content": "\"\"\"\n把各部分分离出来，降低耦合度，使得结构更加清晰\n\"\"\"\n\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom sklearn.datasets import  load_breast_cancer\nfrom sklearn.model_selection import train_test_split\n\n\n\n#initialize parameters(w,b)\ndef initialize_parameters(layer_dims):\n\t\"\"\"\n\t:param layer_dims: list,每一层单元的个数（维度）\n\t:return:dictionary,存储参数w1,w2,...,wL,b1,...,bL\n\t\"\"\"\n\tnp.random.seed(3)\n\tL = len(layer_dims)#the number of layers in the network\n\tparameters = {}\n\tfor l in range(1,L):\n\t\tparameters[\"W\" + str(l)] = np.random.randn(layer_dims[l],layer_dims[l-1])*0.1\n\t\t# parameters[\"W\" + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1])*np.sqrt(2/layer_dims[l-1]) # he initialization\n\t\t# parameters[\"W\" + str(l)] = np.zeros((layer_dims[l], layer_dims[l - 1])) #为了测试初始化为0的后果\n\t\t# parameters[\"W\" + str(l)] = np.random.randn(layer_dims[l], layer_dims[l - 1]) * np.sqrt(1 / layer_dims[l - 1])  # xavier initialization\n\t\tparameters[\"b\" + str(l)] = np.zeros((layer_dims[l],1))\n\treturn parameters\n\ndef linear_forward(x, w, b):\n\t\"\"\"\n\t:param x:\n\t:param w:\n\t:param b:\n\t:return:\n\t\"\"\"\n\tz = np.dot(w, x) + b  # 计算z = wx + b\n\treturn z\n\ndef relu_forward(Z):\n\t\"\"\"\n\t:param Z: Output of the activation layer\n\t:return:\n\tA: output of activation\n\t\"\"\"\n\tA = np.maximum(0,Z)\n\treturn A\n\n#implement the activation function(ReLU and sigmoid)\ndef sigmoid(Z):\n\t\"\"\"\n\t:param Z: Output of the linear layer\n\t:return:\n\t\"\"\"\n\tA = 1 / (1 + np.exp(-Z))\n\treturn A\n\ndef forward_propagation(X, parameters):\n\t\"\"\"\n\tX -- input dataset, of shape (input size, number of examples)\n    parameters -- python dictionary containing your parameters \"W1\", \"b1\", \"W2\", \"b2\",...,\"WL\", \"bL\"\n                    W -- weight matrix of shape (size of current layer, size of previous layer)\n                    b -- bias vector of shape (size of current layer,1)\n    :return:\n\tAL: the output of the last Layer(y_predict)\n\tcaches: list, every element is a tuple:(W,b,z,A_pre)\n\t\"\"\"\n\tL = len(parameters) // 2  # number of layer\n\tA = X\n\tcaches = []\n\t# calculate from 1 to L-1 layer\n\tfor l in range(1,L):\n\t\tW = parameters[\"W\" + str(l)]\n\t\tb = parameters[\"b\" + str(l)]\n\t\t#linear forward -> relu forward ->linear forward....\n\t\tz = linear_forward(A, W, b)\n\t\tcaches.append((A, W, b, z))  # 以激活函数为分割，到z认为是这一层的，激活函数的输出值A认为是下一层的输入，划归到下一层。注意cache的位置，要放在relu前面。\n\t\tA = relu_forward(z) #relu activation function\n\t# calculate Lth layer\n\tWL = parameters[\"W\" + str(L)]\n\tbL = parameters[\"b\" + str(L)]\n\tzL = linear_forward(A, WL, bL)\n\tcaches.append((A, WL, bL, zL))\n\tAL = sigmoid(zL)\n\treturn AL, caches\n\n#calculate cost function\ndef compute_cost(AL,Y):\n\t\"\"\"\n\t:param AL: 最后一层的激活值，即预测值，shape:(1,number of examples)\n\t:param Y:真实值,shape:(1, number of examples)\n\t:return:\n\t\"\"\"\n\tm = Y.shape[1]\n\t# cost = -1.0/m * np.sum(Y*np.log(AL)+(1-Y)*np.log(1.0 - AL))#py中*是点乘\n\t# cost = (1. / m) * (-np.dot(Y, np.log(AL).T) - np.dot(1 - Y, np.log(1 - AL).T)) #推荐用这个，上面那个容易出错\n\tcost = 1. / m * np.nansum(np.multiply(-np.log(AL), Y) +\n\t                          np.multiply(-np.log(1 - AL), 1 - Y))\n\t#从数组的形状中删除单维条目，即把shape中为1的维度去掉，比如把[[[2]]]变成2\n\tcost = np.squeeze(cost)\n\t# print('=====================cost===================')\n\t# print(cost)\n\treturn cost\n\n\n#derivation of relu\ndef relu_backward(dA, Z):\n\t\"\"\"\n\t:param Z: the input of activation function\n\t:param dA:\n\t:return:\n\t\"\"\"\n\tdout = np.multiply(dA, np.int64(Z > 0)) #J对z的求导\n\treturn dout\n\n#derivation of linear\ndef linear_backward(dZ, cache):\n\t\"\"\"\n\t:param dZ: Upstream derivative, the shape (n^[l+1],m)\n\t:param A: input of this layer\n\t:return:\n\t\"\"\"\n\tA, W, b, z = cache\n\tdW = np.dot(dZ, A.T)\n\tdb = np.sum(dZ, axis=1, keepdims=True)\n\tda = np.dot(W.T, dZ)\n\treturn da, dW, db\n\n\ndef backward_propagation(AL, Y, caches):\n\t\"\"\"\n\tImplement the backward propagation presented in figure 2.\n\tArguments:\n\tX -- input dataset, of shape (input size, number of examples)\n\tY -- true \"label\" vector (containing 0 if cat, 1 if non-cat)\n\tcaches -- caches output from forward_propagation(),(W,b,z,pre_A)\n\n\tReturns:\n\tgradients -- A dictionary with the gradients with respect to dW,db\n\t\"\"\"\n\tm = Y.shape[1]\n\tL = len(caches) - 1\n\t#calculate the Lth layer gradients\n\tdz = 1. / m * (AL - Y)\n\tda, dWL, dbL = linear_backward(dz, caches[L])\n\tgradients = {\"dW\" + str(L + 1): dWL, \"db\" + str(L + 1): dbL}\n\n\t#calculate from L-1 to 1 layer gradients\n\tfor l in reversed(range(0,L)): # L-1,L-3,....,0\n\t\tA, W, b, z = caches[l]\n\t\t#ReLu backward -> linear backward\n\t\t#relu backward\n\t\tdout = relu_backward(da, z)\n\t\t#linear backward\n\t\tda, dW, db = linear_backward(dout, caches[l])\n\t\t# print(\"========dW\" + str(l+1) + \"================\")\n\t\t# print(dW.shape)\n\t\tgradients[\"dW\" + str(l+1)] = dW\n\t\tgradients[\"db\" + str(l+1)] = db\n\treturn gradients\n\ndef update_parameters(parameters, grads, learning_rate):\n\t\"\"\"\n\t:param parameters: dictionary,  W,b\n\t:param grads: dW,db\n\t:param learning_rate: alpha\n\t:return:\n\t\"\"\"\n\tL = len(parameters) // 2\n\tfor l in range(L):\n\t\tparameters[\"W\" + str(l + 1)] = parameters[\"W\" + str(l + 1)] - learning_rate * grads[\"dW\" + str(l+1)]\n\t\tparameters[\"b\" + str(l + 1)] = parameters[\"b\" + str(l + 1)] - learning_rate * grads[\"db\" + str(l+1)]\n\treturn parameters\n\ndef L_layer_model(X, Y, layer_dims, learning_rate, num_iterations):\n\t\"\"\"\n\t:param X:\n\t:param Y:\n\t:param layer_dims:list containing the input size and each layer size\n\t:param learning_rate:\n\t:param num_iterations:\n\t:return:\n\tparameters：final parameters:(W,b)\n\t\"\"\"\n\tcosts = []\n\t# initialize parameters\n\tparameters = initialize_parameters(layer_dims)\n\tfor i in range(0, num_iterations):\n\t\t#foward propagation\n\t\tAL,caches = forward_propagation(X, parameters)\n\t\t# calculate the cost\n\t\tcost = compute_cost(AL, Y)\n\t\tif i % 1000 == 0:\n\t\t\tprint(\"Cost after iteration {}: {}\".format(i, cost))\n\t\t\tcosts.append(cost)\n\t\t#backward propagation\n\t\tgrads = backward_propagation(AL, Y, caches)\n\t\t#update parameters\n\t\tparameters = update_parameters(parameters, grads, learning_rate)\n\tprint('length of cost')\n\tprint(len(costs))\n\tplt.clf()\n\tplt.plot(costs)  # o-:圆形\n\tplt.xlabel(\"iterations(thousand)\")  # 横坐标名字\n\tplt.ylabel(\"cost\")  # 纵坐标名字\n\tplt.show()\n\treturn parameters\n\n#predict function\ndef predict(X_test,y_test,parameters):\n\t\"\"\"\n\t:param X:\n\t:param y:\n\t:param parameters:\n\t:return:\n\t\"\"\"\n\tm = y_test.shape[1]\n\tY_prediction = np.zeros((1, m))\n\tprob, caches = forward_propagation(X_test,parameters)\n\tfor i in range(prob.shape[1]):\n\t\t# Convert probabilities A[0,i] to actual predictions p[0,i]\n\t\tif prob[0, i] > 0.5:\n\t\t\tY_prediction[0, i] = 1\n\t\telse:\n\t\t\tY_prediction[0, i] = 0\n\taccuracy = 1- np.mean(np.abs(Y_prediction - y_test))\n\treturn accuracy\n\n#DNN model\ndef DNN(X_train, y_train, X_test, y_test, layer_dims, learning_rate= 0.001, num_iterations=30000):\n\tparameters = L_layer_model(X_train, y_train, layer_dims, learning_rate, num_iterations)\n\taccuracy = predict(X_test,y_test,parameters)\n\treturn accuracy\n\nif __name__ == \"__main__\":\n\tX_data, y_data = load_breast_cancer(return_X_y=True)\n\tX_train, X_test,y_train,y_test = train_test_split(X_data, y_data, train_size=0.8,random_state=28)\n\tX_train = X_train.T\n\ty_train = y_train.reshape(y_train.shape[0], -1).T\n\tX_test = X_test.T\n\ty_test = y_test.reshape(y_test.shape[0], -1).T\n\taccuracy = DNN(X_train,y_train,X_test,y_test,[X_train.shape[0],10,5,1])\n\tprint(accuracy)"
  },
  {
    "path": "deep_neural_network_v1.py",
    "content": "import numpy as np\nimport matplotlib.pyplot as plt\nfrom sklearn.datasets import  load_breast_cancer\nfrom sklearn.model_selection import train_test_split\n#initialize parameters(w,b)\ndef initialize_parameters(layer_dims):\n\t\"\"\"\n\t:param layer_dims: list,每一层单元的个数（维度）\n\t:return:dictionary,存储参数w1,w2,...,wL,b1,...,bL\n\t\"\"\"\n\tnp.random.seed(3)\n\tL = len(layer_dims)#the number of layers in the network\n\tparameters = {}\n\tfor l in range(1,L):\n\t\tparameters[\"W\" + str(l)] = np.random.randn(layer_dims[l],layer_dims[l-1])*0.01\n\t\tparameters[\"b\" + str(l)] = np.zeros((layer_dims[l],1))\n\treturn parameters\ndef relu(Z):\n\t\"\"\"\n\t:param Z: Output of the linear layer\n\t:return:\n\tA: output of activation\n\t\"\"\"\n\tA = np.maximum(0,Z)\n\treturn A\n#implement the activation function(ReLU and sigmoid)\ndef sigmoid(Z):\n\t\"\"\"\n\t:param Z: Output of the linear layer\n\t:return:\n\t\"\"\"\n\tA = 1 / (1 + np.exp(-Z))\n\treturn A\n\ndef forward_propagation(X, parameters):\n\t\"\"\"\n\tX -- input dataset, of shape (input size, number of examples)\n    parameters -- python dictionary containing your parameters \"W1\", \"b1\", \"W2\", \"b2\",...,\"WL\", \"bL\"\n                    W -- weight matrix of shape (size of current layer, size of previous layer)\n                    b -- bias vector of shape (size of current layer,1)\n    :return:\n\tAL: the output of the last Layer(y_predict)\n\tcaches: list, every element is a tuple:(W,b,z,A_pre)\n\t\"\"\"\n\tL = len(parameters) // 2  # number of layer\n\tA = X\n\tcaches = []  # 用于存储每一层的，w,b,z,A\n\t# calculate from 1 to L-1 layer\n\tfor l in range(1,L):\n\t\tA_pre = A\n\t\tW = parameters[\"W\" + str(l)]\n\t\tb = parameters[\"b\" + str(l)]\n\t\tz = np.dot(W,A_pre) + b #计算z = wx + b\n\t\tA = relu(z) #relu activation function\n\t\tcaches.append((W,b,z,A_pre))\n\t# calculate Lth layer\n\tWL = parameters[\"W\" + str(L)]\n\tbL = parameters[\"b\" + str(L)]\n\tzL = np.dot(WL,A) + bL\n\tAL = sigmoid(zL)\n\tcaches.append((WL,bL,zL,A))\n\treturn AL, caches\n#calculate cost function\ndef compute_cost(AL,Y):\n\t\"\"\"\n\t:param AL: 最后一层的激活值，即预测值，shape:(1,number of examples)\n\t:param Y:真实值,shape:(1, number of examples)\n\t:return:\n\t\"\"\"\n\tm = Y.shape[1]\n\t# cost = -1.0/m * np.sum(Y*np.log(AL)+(1-Y)*np.log(1.0 - AL))#py中*是点乘\n\t# cost = (1. / m) * (-np.dot(Y, np.log(AL).T) - np.dot(1 - Y, np.log(1 - AL).T)) #推荐用这个，上面那个容易出错\n\tcost = 1. / m * np.nansum(np.multiply(-np.log(AL), Y) +\n\t                          np.multiply(-np.log(1 - AL), 1 - Y))\n\t#从数组的形状中删除单维条目，即把shape中为1的维度去掉，比如把[[[2]]]变成2\n\tcost = np.squeeze(cost)\n\treturn cost\n\n# derivation of relu\ndef relu_backward(Z):\n\t\"\"\"\n\t:param Z: the input of activation\n\t:return:\n\t\"\"\"\n\tdA = np.int64(Z > 0)\n\treturn dA\n\ndef backward_propagation(AL, Y, caches):\n\t\"\"\"\n\tImplement the backward propagation presented in figure 2.\n\tArguments:\n\tX -- input dataset, of shape (input size, number of examples)\n\tY -- true \"label\" vector (containing 0 if cat, 1 if non-cat)\n\tcaches -- caches output from forward_propagation(),(W,b,z,pre_A)\n\n\tReturns:\n\tgradients -- A dictionary with the gradients with respect to dW,db\n\t\"\"\"\n\tm = Y.shape[1]\n\tL = len(caches)\n\t# print(\"L:   \" + str(L))\n\t#calculate the Lth layer gradients\n\tprev_AL = caches[L-1][3]\n\tdzL = 1./m * (AL - Y)\n\t# print(dzL.shape)\n\t# print(prev_AL.T.shape)\n\tdWL = np.dot(dzL, prev_AL.T)\n\tdbL = np.sum(dzL, axis=1, keepdims=True)\n\tgradients = {\"dW\"+str(L):dWL, \"db\"+str(L):dbL}\n\t#calculate from L-1 to 1 layer gradients\n\tfor l in reversed(range(0,L-1)):\n\t\tpost_W= caches[l+1][0] #要用后一层的W\n\t\tdz = dzL #用后一层的dz\n\n\t\tdal = np.dot(post_W.T, dz)\n\t\tz = caches[l+1][2]#当前层的z\n\t\tdzl = np.multiply(dal, relu_backward(z))#可以直接用dzl = np.multiply(dal, np.int64(z > 0))来实现\n\t\tprev_A = caches[l][3]#前一层的A\n\t\tdWl = np.dot(dzl, prev_A.T)\n\t\tdbl = np.sum(dzl, axis=1, keepdims=True)\n\n\t\tgradients[\"dW\" + str(l+1)] = dWl\n\t\tgradients[\"db\" + str(l+1)] = dbl\n\t\tdzL = dzl #更新dz\n\treturn gradients\n\ndef update_parameters(parameters, grads, learning_rate):\n\t\"\"\"\n\t:param parameters: dictionary,  W,b\n\t:param grads: dW,db\n\t:param learning_rate: alpha\n\t:return:\n\t\"\"\"\n\tL = len(parameters) // 2\n\tfor l in range(L):\n\t\tparameters[\"W\" + str(l + 1)] = parameters[\"W\" + str(l + 1)] - learning_rate * grads[\"dW\" + str(l+1)]\n\t\tparameters[\"b\" + str(l + 1)] = parameters[\"b\" + str(l + 1)] - learning_rate * grads[\"db\" + str(l+1)]\n\treturn parameters\n\ndef L_layer_model(X, Y, layer_dims, learning_rate, num_iterations):\n\t\"\"\"\n\t:param X:\n\t:param Y:\n\t:param layer_dims:list containing the input size and each layer size\n\t:param learning_rate:\n\t:param num_iterations:\n\t:return:\n\tparameters：final parameters:(W,b)\n\t\"\"\"\n\tcosts = []\n\t# initialize parameters\n\tparameters = initialize_parameters(layer_dims)\n\tfor i in range(0, num_iterations):\n\t\t#foward propagation\n\t\tAL,caches = forward_propagation(X, parameters)\n\t\t# calculate the cost\n\t\tcost = compute_cost(AL, Y)\n\t\tif i % 1000 == 0:\n\t\t\tprint(\"Cost after iteration {}: {}\".format(i, cost))\n\t\t\tcosts.append(cost)\n\t\t#backward propagation\n\t\tgrads = backward_propagation(AL, Y, caches)\n\t\t#update parameters\n\t\tparameters = update_parameters(parameters, grads, learning_rate)\n\tplt.clf()\n\tplt.plot(costs)\n\tplt.xlabel(\"iterations(thousand)\")  # 横坐标名字\n\tplt.ylabel(\"cost\")  # 纵坐标名字\n\tplt.show()\n\treturn parameters\n\n#predict function\ndef predict(X_test,y_test,parameters):\n\t\"\"\"\n\t:param X:\n\t:param y:\n\t:param parameters:\n\t:return:\n\t\"\"\"\n\tm = y_test.shape[1]\n\tY_prediction = np.zeros((1, m))\n\tprob, caches = forward_propagation(X_test,parameters)\n\tfor i in range(prob.shape[1]):\n\t\t# Convert probabilities A[0,i] to actual predictions p[0,i]\n\t\tif prob[0, i] > 0.5:\n\t\t\tY_prediction[0, i] = 1\n\t\telse:\n\t\t\tY_prediction[0, i] = 0\n\taccuracy = 1- np.mean(np.abs(Y_prediction - y_test))\n\treturn accuracy\n#DNN model\ndef DNN(X_train, y_train, X_test, y_test, layer_dims, learning_rate= 0.01, num_iterations=15000):\n\tparameters = L_layer_model(X_train, y_train, layer_dims, learning_rate, num_iterations)\n\taccuracy = predict(X_test,y_test,parameters)\n\treturn accuracy\nif __name__ == \"__main__\":\n\tX_data, y_data = load_breast_cancer(return_X_y=True)\n\tX_train, X_test,y_train,y_test = train_test_split(X_data, y_data, train_size=0.8,random_state=28)\n\tX_train = X_train.T\n\ty_train = y_train.reshape(y_train.shape[0], -1).T\n\tX_test = X_test.T\n\ty_test = y_test.reshape(y_test.shape[0], -1).T\n\taccuracy = DNN(X_train,y_train,X_test,y_test,[X_train.shape[0],20, 10, 5, 1])\n\tprint(accuracy)"
  },
  {
    "path": "deep_neural_network_v2.py",
    "content": "import numpy as np\nimport matplotlib.pyplot as plt\nfrom sklearn.datasets import  load_breast_cancer\nfrom sklearn.model_selection import train_test_split\n#initialize parameters(w,b)\ndef initialize_parameters(layer_dims):\n\t\"\"\"\n\t:param layer_dims: list,每一层单元的个数（维度）\n\t:return:dictionary,存储参数w1,w2,...,wL,b1,...,bL\n\t\"\"\"\n\tnp.random.seed(3)\n\tL = len(layer_dims)#the number of layers in the network\n\tparameters = {}\n\tfor l in range(1,L):\n\t\tparameters[\"W\" + str(l)] = np.random.randn(layer_dims[l],layer_dims[l-1])*0.01\n\t\t# parameters[\"W\" + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1])*np.sqrt(2/layer_dims[l-1]) # he initialization\n\t\t# parameters[\"W\" + str(l)] = np.zeros((layer_dims[l], layer_dims[l - 1])) #为了测试初始化为0的后果\n\t\t# parameters[\"W\" + str(l)] = np.random.randn(layer_dims[l], layer_dims[l - 1]) * np.sqrt(1 / layer_dims[l - 1])  # xavier initialization\n\t\tparameters[\"b\" + str(l)] = np.zeros((layer_dims[l],1))\n\treturn parameters\ndef relu(Z):\n\t\"\"\"\n\t:param Z: Output of the linear layer\n\t:return:\n\tA: output of activation\n\t\"\"\"\n\tA = np.maximum(0,Z)\n\treturn A\n#implement the activation function(ReLU and sigmoid)\ndef sigmoid(Z):\n\t\"\"\"\n\t:param Z: Output of the linear layer\n\t:return:\n\t\"\"\"\n\tA = 1 / (1 + np.exp(-Z))\n\treturn A\n\ndef forward_propagation(X, parameters):\n\t\"\"\"\n\tX -- input dataset, of shape (input size, number of examples)\n    parameters -- python dictionary containing your parameters \"W1\", \"b1\", \"W2\", \"b2\",...,\"WL\", \"bL\"\n                    W -- weight matrix of shape (size of current layer, size of previous layer)\n                    b -- bias vector of shape (size of current layer,1)\n    :return:\n\tAL: the output of the last Layer(y_predict)\n\tcaches: list, every element is a tuple:(W,b,z,A_pre)\n\t\"\"\"\n\tL = len(parameters) // 2  # number of layer\n\tA = X\n\tcaches = [(None,None,None,X)]  # 第0层(None,None,None,A0) w,b,z用none填充,下标与层数一致，用于存储每一层的，w,b,z,A\n\t# calculate from 1 to L-1 layer\n\tfor l in range(1,L):\n\t\tA_pre = A\n\t\tW = parameters[\"W\" + str(l)]\n\t\tb = parameters[\"b\" + str(l)]\n\t\tz = np.dot(W,A_pre) + b #计算z = wx + b\n\t\tA = relu(z) #relu activation function\n\t\tcaches.append((W,b,z,A))\n\t# calculate Lth layer\n\tWL = parameters[\"W\" + str(L)]\n\tbL = parameters[\"b\" + str(L)]\n\tzL = np.dot(WL,A) + bL\n\tAL = sigmoid(zL)\n\tcaches.append((WL,bL,zL,AL))\n\treturn AL, caches\n#calculate cost function\ndef compute_cost(AL,Y):\n\t\"\"\"\n\t:param AL: 最后一层的激活值，即预测值，shape:(1,number of examples)\n\t:param Y:真实值,shape:(1, number of examples)\n\t:return:\n\t\"\"\"\n\tm = Y.shape[1]\n\t# cost = -1.0/m * np.sum(Y*np.log(AL)+(1-Y)*np.log(1.0 - AL))#py中*是点乘\n\t# cost = (1. / m) * (-np.dot(Y, np.log(AL).T) - np.dot(1 - Y, np.log(1 - AL).T)) #推荐用这个，上面那个容易出错\n\tcost = 1. / m * np.nansum(np.multiply(-np.log(AL), Y) +\n\t                          np.multiply(-np.log(1 - AL), 1 - Y))\n\t#从数组的形状中删除单维条目，即把shape中为1的维度去掉，比如把[[[2]]]变成2\n\tcost = np.squeeze(cost)\n\t# print('=====================cost===================')\n\t# print(cost)\n\treturn cost\n\n# derivation of relu\ndef relu_backward(Z):\n\t\"\"\"\n\t:param Z: the input of activation\n\t:return:\n\t\"\"\"\n\tdA = np.int64(Z > 0)\n\treturn dA\n\ndef backward_propagation(AL, Y, caches):\n\t\"\"\"\n\tImplement the backward propagation presented in figure 2.\n\tArguments:\n\tX -- input dataset, of shape (input size, number of examples)\n\tY -- true \"label\" vector (containing 0 if cat, 1 if non-cat)\n\tcaches -- caches output from forward_propagation(),(W,b,z,pre_A)\n\n\tReturns:\n\tgradients -- A dictionary with the gradients with respect to dW,db\n\t\"\"\"\n\tm = Y.shape[1]\n\tL = len(caches) - 1\n\t# print(\"L:   \" + str(L))\n\t#calculate the Lth layer gradients\n\tprev_AL = caches[L-1][3]\n\tdzL = 1./m * (AL - Y)\n\t# print(dzL.shape)\n\t# print(prev_AL.T.shape)\n\tdWL = np.dot(dzL, prev_AL.T)\n\tdbL = np.sum(dzL, axis=1, keepdims=True)\n\tgradients = {\"dW\"+str(L):dWL, \"db\"+str(L):dbL}\n\t#calculate from L-1 to 1 layer gradients\n\tfor l in reversed(range(1,L)): # L-1,L-3,....,1\n\t\tpost_W= caches[l+1][0] #要用后一层的W\n\t\tdz = dzL #用后一层的dz\n\n\t\tdal = np.dot(post_W.T, dz)\n\t\tZ = caches[l][2]#当前层的Z\n\t\tdzl = np.multiply(dal, relu_backward(Z))#可以直接用dzl = np.multiply(dal, np.int64(Z > 0))来实现\n\t\tprev_A = caches[l-1][3]#前一层的A\n\t\tdWl = np.dot(dzl, prev_A.T)\n\t\tdbl = np.sum(dzl, axis=1, keepdims=True)\n\n\t\tgradients[\"dW\" + str(l)] = dWl\n\t\tgradients[\"db\" + str(l)] = dbl\n\t\tdzL = dzl #更新dz\n\treturn gradients\n\ndef update_parameters(parameters, grads, learning_rate):\n\t\"\"\"\n\t:param parameters: dictionary,  W,b\n\t:param grads: dW,db\n\t:param learning_rate: alpha\n\t:return:\n\t\"\"\"\n\tL = len(parameters) // 2\n\tfor l in range(L):\n\t\tparameters[\"W\" + str(l + 1)] = parameters[\"W\" + str(l + 1)] - learning_rate * grads[\"dW\" + str(l+1)]\n\t\tparameters[\"b\" + str(l + 1)] = parameters[\"b\" + str(l + 1)] - learning_rate * grads[\"db\" + str(l+1)]\n\treturn parameters\n\ndef L_layer_model(X, Y, layer_dims, learning_rate, num_iterations):\n\t\"\"\"\n\t:param X:\n\t:param Y:\n\t:param layer_dims:list containing the input size and each layer size\n\t:param learning_rate:\n\t:param num_iterations:\n\t:return:\n\tparameters：final parameters:(W,b)\n\t\"\"\"\n\tcosts = []\n\t# initialize parameters\n\tparameters = initialize_parameters(layer_dims)\n\tfor i in range(0, num_iterations):\n\t\t#foward propagation\n\t\tAL,caches = forward_propagation(X, parameters)\n\t\t# calculate the cost\n\t\tcost = compute_cost(AL, Y)\n\t\tif i % 1000 == 0:\n\t\t\tprint(\"Cost after iteration {}: {}\".format(i, cost))\n\t\t\tcosts.append(cost)\n\t\t#backward propagation\n\t\tgrads = backward_propagation(AL, Y, caches)\n\t\t#update parameters\n\t\tparameters = update_parameters(parameters, grads, learning_rate)\n\tprint('length of cost')\n\tprint(len(costs))\n\tplt.clf()\n\tplt.plot(costs)  # o-:圆形\n\tplt.xlabel(\"iterations(thousand)\")  # 横坐标名字\n\tplt.ylabel(\"cost\")  # 纵坐标名字\n\tplt.show()\n\treturn parameters\n\n#predict function\ndef predict(X_test,y_test,parameters):\n\t\"\"\"\n\t:param X:\n\t:param y:\n\t:param parameters:\n\t:return:\n\t\"\"\"\n\tm = y_test.shape[1]\n\tY_prediction = np.zeros((1, m))\n\tprob, caches = forward_propagation(X_test,parameters)\n\tfor i in range(prob.shape[1]):\n\t\t# Convert probabilities A[0,i] to actual predictions p[0,i]\n\t\tif prob[0, i] > 0.5:\n\t\t\tY_prediction[0, i] = 1\n\t\telse:\n\t\t\tY_prediction[0, i] = 0\n\taccuracy = 1- np.mean(np.abs(Y_prediction - y_test))\n\treturn accuracy\n#DNN model\ndef DNN(X_train, y_train, X_test, y_test, layer_dims, learning_rate= 0.01, num_iterations=15000):\n\tparameters = L_layer_model(X_train, y_train, layer_dims, learning_rate, num_iterations)\n\taccuracy = predict(X_test,y_test,parameters)\n\treturn accuracy\nif __name__ == \"__main__\":\n\tX_data, y_data = load_breast_cancer(return_X_y=True)\n\tX_train, X_test,y_train,y_test = train_test_split(X_data, y_data, train_size=0.8,random_state=28)\n\tX_train = X_train.T\n\ty_train = y_train.reshape(y_train.shape[0], -1).T\n\tX_test = X_test.T\n\ty_test = y_test.reshape(y_test.shape[0], -1).T\n\taccuracy = DNN(X_train,y_train,X_test,y_test,[X_train.shape[0],20, 10, 5, 1])\n\tprint(accuracy)"
  },
  {
    "path": "deep_neural_network_with_L2.py",
    "content": "import numpy as np\nimport matplotlib.pyplot as plt\nfrom sklearn.datasets import  load_breast_cancer\nfrom sklearn.model_selection import train_test_split\n\n#initialize parameters(w,b)\ndef initialize_parameters(layer_dims):\n\t\"\"\"\n\t:param layer_dims: list,每一层单元的个数（维度）\n\t:return:dictionary,存储参数w1,w2,...,wL,b1,...,bL\n\t\"\"\"\n\tnp.random.seed(3)\n\tL = len(layer_dims)#the number of layers in the network\n\tparameters = {}\n\tfor l in range(1,L):\n\t\t# parameters[\"W\" + str(l)] = np.random.randn(layer_dims[l],layer_dims[l-1])*0.01\n\t\t# parameters[\"W\" + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1])*np.sqrt(2/layer_dims[l-1]) # he initialization\n\t\t# parameters[\"W\" + str(l)] = np.zeros((layer_dims[l], layer_dims[l - 1])) #为了测试初始化为0的后果\n\t\tparameters[\"W\" + str(l)] = np.random.randn(layer_dims[l], layer_dims[l - 1]) * np.sqrt(1 / layer_dims[l - 1])  # xavier initialization\n\t\tparameters[\"b\" + str(l)] = np.zeros((layer_dims[l],1))\n\treturn parameters\n\ndef relu(Z):\n\t\"\"\"\n\t:param Z: Output of the linear layer\n\t:return:\n\tA: output of activation\n\t\"\"\"\n\tA = np.maximum(0,Z)\n\treturn A\n#implement the activation function(ReLU and sigmoid)\ndef sigmoid(Z):\n\t\"\"\"\n\t:param Z: Output of the linear layer\n\t:return:\n\t\"\"\"\n\tA = 1 / (1 + np.exp(-Z))\n\treturn A\n\ndef forward_propagation(X, parameters):\n\t\"\"\"\n\tX -- input dataset, of shape (input size, number of examples)\n    parameters -- python dictionary containing your parameters \"W1\", \"b1\", \"W2\", \"b2\",...,\"WL\", \"bL\"\n                    W -- weight matrix of shape (size of current layer, size of previous layer)\n                    b -- bias vector of shape (size of current layer,1)\n    :return:\n\tAL: the output of the last Layer(y_predict)\n\tcaches: list, every element is a tuple:(W,b,z,A_pre)\n\t\"\"\"\n\tL = len(parameters) // 2  # number of layer\n\tA = X\n\tcaches = [(None,None,None,X)]  # 第0层(None,None,None,A0) w,b,z用none填充,下标与层数一致，用于存储每一层的，w,b,z,A\n\t# calculate from 1 to L-1 layer\n\tfor l in range(1,L):\n\t\tA_pre = A\n\t\tW = parameters[\"W\" + str(l)]\n\t\tb = parameters[\"b\" + str(l)]\n\t\tz = np.dot(W,A_pre) + b #计算z = wx + b\n\t\tA = relu(z) #relu activation function\n\t\tcaches.append((W,b,z,A))\n\t# calculate Lth layer\n\tWL = parameters[\"W\" + str(L)]\n\tbL = parameters[\"b\" + str(L)]\n\tzL = np.dot(WL,A) + bL\n\tAL = sigmoid(zL)\n\tcaches.append((WL,bL,zL,AL))\n\treturn AL, caches\n#calculate cost function\n\ndef compute_cost(AL,Y):\n\t\"\"\"\n\t:param AL: 最后一层的激活值，即预测值，shape:(1,number of examples)\n\t:param Y:真实值,shape:(1, number of examples)\n\t:return:\n\t\"\"\"\n\tm = Y.shape[1]\n\tcost = 1. / m * np.nansum(np.multiply(-np.log(AL), Y) + np.multiply(-np.log(1 - AL), 1 - Y))\n\t#从数组的形状中删除单维条目，即把shape中为1的维度去掉，比如把[[[2]]]变成2\n\tcost = np.squeeze(cost)\n\treturn cost\n\n\ndef compute_cost_with_regularization(AL, Y, parameters, lambd):\n\t\"\"\"\n\tImplement the cost function with L2 regularization. See formula (2) above.\n\tArguments:\n\tA3 -- post-activation, output of forward propagation, of shape (output size, number of examples)\n\tY -- \"true\" labels vector, of shape (output size, number of examples)\n\tparameters -- python dictionary containing parameters of the model\n\tReturns:\n\tcost - value of the regularized loss function\n\t\"\"\"\n\tm = Y.shape[1]\n\tcross_entropy_cost = compute_cost(AL, Y)  # This gives you the cross-entropy part of the cost\n\tL = len(parameters) // 2\n\tL2_regularization_cost = 0\n\tfor l in range(0,L):\n\t\tL2_regularization_cost += (1. / m) * (lambd / 2.) * (np.sum(np.square(parameters[\"W\" + str(l+1)])))\n\tcost = cross_entropy_cost + L2_regularization_cost\n\n\treturn cost\n\n\n# derivation of relu\ndef relu_backward(Z):\n\t\"\"\"\n\t:param Z: the input of activation\n\t:return:\n\t\"\"\"\n\tdA = np.int64(Z > 0)\n\treturn dA\n\ndef backward_propagation_with_regularization(AL, Y, caches, lambd):\n\t\"\"\"\n\tImplement the backward propagation presented in figure 2.\n\tArguments:\n\tAL: the output of last layer , i.e predict\n\tY -- true \"label\" vector (containing 0 if cat, 1 if non-cat)\n\tcaches -- caches output from forward_propagation(),(W,b,z,A)\n\n\tReturns:\n\tgradients -- A dictionary with the gradients with respect to dW,db\n\t\"\"\"\n\tm = Y.shape[1]\n\tL = len(caches) - 1\n\t# print(\"L:   \" + str(L))\n\t#calculate the Lth layer gradients\n\tprev_AL = caches[L-1][3]\n\tdzL = 1./m * (AL - Y)\n\t# print(dzL.shape)\n\t# print(prev_AL.T.shape)\n\tdWL = np.dot(dzL, prev_AL.T) + lambd/m * caches[L][0]\n\tdbL = np.sum(dzL, axis=1, keepdims=True)\n\tgradients = {\"dW\" + str(L): dWL, \"db\" + str(L): dbL}\n\t#calculate from L-1 to 1 layer gradients\n\tfor l in reversed(range(1,L)): # L-1,L-3,....,1\n\t\tpost_W= caches[l+1][0] #要用后一层的W\n\t\tdz = dzL #用后一层的dz\n\t\tdal = np.dot(post_W.T, dz)\n\t\tz = caches[l][2]#当前层的z\n\t\tdzl = np.multiply(dal, relu_backward(z))#可以直接用dzl = np.multiply(dal, np.int64(Al > 0))来实现\n\t\tprev_A = caches[l-1][3]#前一层的A\n\t\tdWl = np.dot(dzl, prev_A.T) + lambd/m * caches[l][0]\n\t\tdbl = np.sum(dzl, axis=1, keepdims=True)\n\n\t\tgradients[\"dW\" + str(l)] = dWl\n\t\tgradients[\"db\" + str(l)] = dbl\n\t\tdzL = dzl #更新dz\n\treturn gradients\n\n\ndef update_parameters(parameters, grads, learning_rate):\n\t\"\"\"\n\t:param parameters: dictionary,  W,b\n\t:param grads: dW,db\n\t:param learning_rate: alpha\n\t:return:\n\t\"\"\"\n\tL = len(parameters) // 2\n\tfor l in range(L):\n\t\tparameters[\"W\" + str(l + 1)] = parameters[\"W\" + str(l + 1)] - learning_rate * grads[\"dW\" + str(l+1)]\n\t\tparameters[\"b\" + str(l + 1)] = parameters[\"b\" + str(l + 1)] - learning_rate * grads[\"db\" + str(l+1)]\n\treturn parameters\n\ndef L_layer_model(X, Y, layer_dims, learning_rate, num_iterations,lambd):\n\t\"\"\"\n\t:param X:\n\t:param Y:\n\t:param layer_dims:list containing the input size and each layer size\n\t:param learning_rate:\n\t:param num_iterations:\n\t:return:\n\tparameters：final parameters:(W,b)\n\t\"\"\"\n\tcosts = []\n\t# initialize parameters\n\tparameters = initialize_parameters(layer_dims)\n\tfor i in range(0, num_iterations):\n\t\t#foward propagation\n\t\tAL,caches = forward_propagation(X, parameters)\n\t\t# calculate the cost\n\t\tcost = compute_cost_with_regularization(AL, Y, parameters, lambd)\n\t\tif i % 1000 == 0:\n\t\t\tprint(\"Cost after iteration {}: {}\".format(i, cost))\n\t\t\tcosts.append(cost)\n\t\t#backward propagation\n\t\tgrads = backward_propagation_with_regularization(AL, Y, caches, lambd)\n\t\t#update parameters\n\t\tparameters = update_parameters(parameters, grads, learning_rate)\n\tprint('length of cost')\n\tprint(len(costs))\n\tplt.clf()\n\tplt.plot(costs)  # o-:圆形\n\tplt.xlabel(\"iterations(thousand)\")  # 横坐标名字\n\tplt.ylabel(\"cost\")  # 纵坐标名字\n\tplt.show()\n\treturn parameters\n\n#predict function\ndef predict(X_test,y_test,parameters):\n\t\"\"\"\n\t:param X:\n\t:param y:\n\t:param parameters:\n\t:return:\n\t\"\"\"\n\tm = y_test.shape[1]\n\tY_prediction = np.zeros((1, m))\n\tprob, caches = forward_propagation(X_test,parameters)\n\tfor i in range(prob.shape[1]):\n\t\t# Convert probabilities A[0,i] to actual predictions p[0,i]\n\t\tif prob[0, i] > 0.5:\n\t\t\tY_prediction[0, i] = 1\n\t\telse:\n\t\t\tY_prediction[0, i] = 0\n\taccuracy = 1- np.mean(np.abs(Y_prediction - y_test))\n\treturn accuracy\n#DNN model\ndef DNN(X_train, y_train, X_test, y_test, layer_dims, learning_rate= 0.001, num_iterations=20000, lambd = 0.):\n\tparameters = L_layer_model(X_train, y_train, layer_dims, learning_rate, num_iterations,lambd)\n\taccuracy = predict(X_test,y_test,parameters)\n\treturn accuracy\nif __name__ == \"__main__\":\n\tX_data, y_data = load_breast_cancer(return_X_y=True)\n\tX_train, X_test,y_train,y_test = train_test_split(X_data, y_data, train_size=0.8,random_state=28)\n\tX_train = X_train.T\n\ty_train = y_train.reshape(y_train.shape[0], -1).T\n\tX_test = X_test.T\n\ty_test = y_test.reshape(y_test.shape[0], -1).T\n\t# X_train, y_train, X_test, y_test = load_2D_dataset()\n\taccuracy = DNN(X_train,y_train,X_test,y_test,[X_train.shape[0],10,5,1],lambd = 0.7)\n\tprint(accuracy)\n\n\t# A3, Y_assess, parameters = compute_cost_with_regularization_test_case()\n\t# print(\"cost = \" + str(compute_cost_with_regularization(A3, Y_assess, parameters, lambd=0.1)))"
  },
  {
    "path": "deep_neural_network_with_dropout.py",
    "content": "import numpy as np\nimport matplotlib.pyplot as plt\nfrom sklearn.datasets import  load_breast_cancer\nfrom sklearn.model_selection import train_test_split\n\n\n#initialize parameters(w,b)\ndef initialize_parameters(layer_dims):\n\t\"\"\"\n\t:param layer_dims: list,每一层单元的个数（维度）\n\t:return:dictionary,存储参数w1,w2,...,wL,b1,...,bL\n\t\"\"\"\n\tnp.random.seed(3)\n\tL = len(layer_dims)#the number of layers in the network\n\tparameters = {}\n\tfor l in range(1,L):\n\t\t# parameters[\"W\" + str(l)] = np.random.randn(layer_dims[l],layer_dims[l-1])*0.01\n\t\t# parameters[\"W\" + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1])*np.sqrt(2/layer_dims[l-1]) # he initialization\n\t\t# parameters[\"W\" + str(l)] = np.zeros((layer_dims[l], layer_dims[l - 1])) #为了测试初始化为0的后果\n\t\tparameters[\"W\" + str(l)] = np.random.randn(layer_dims[l], layer_dims[l - 1]) * np.sqrt(1 / layer_dims[l - 1])  # xavier initialization\n\t\tparameters[\"b\" + str(l)] = np.zeros((layer_dims[l],1))\n\treturn parameters\n\ndef relu(Z):\n\t\"\"\"\n\t:param Z: Output of the linear layer\n\t:return:\n\tA: output of activation\n\t\"\"\"\n\tA = np.maximum(0,Z)\n\treturn A\n#implement the activation function(ReLU and sigmoid)\ndef sigmoid(Z):\n\t\"\"\"\n\t:param Z: Output of the linear layer\n\t:return:\n\t\"\"\"\n\tA = 1 / (1 + np.exp(-Z))\n\treturn A\n\ndef forward_propagation(X, parameters):\n\t\"\"\"\n\tX -- input dataset, of shape (input size, number of examples)\n    parameters -- python dictionary containing your parameters \"W1\", \"b1\", \"W2\", \"b2\",...,\"WL\", \"bL\"\n                    W -- weight matrix of shape (size of current layer, size of previous layer)\n                    b -- bias vector of shape (size of current layer,1)\n    :return:\n\tAL: the output of the last Layer(y_predict)\n\tcaches: list, every element is a tuple:(W,b,z,A_pre)\n\t\"\"\"\n\tL = len(parameters) // 2  # number of layer\n\tA = X\n\tcaches = [(None,None,None,X)]  # 第0层(None,None,None,A0) w,b,z用none填充,下标与层数一致，用于存储每一层的，w,b,z,A\n\t# calculate from 1 to L-1 layer\n\tfor l in range(1,L):\n\t\tA_pre = A\n\t\tW = parameters[\"W\" + str(l)]\n\t\tb = parameters[\"b\" + str(l)]\n\t\tz = np.dot(W,A_pre) + b #计算z = wx + b\n\t\tA = relu(z) #relu activation function\n\t\tcaches.append((W,b,z,A))\n\t# calculate Lth layer\n\tWL = parameters[\"W\" + str(L)]\n\tbL = parameters[\"b\" + str(L)]\n\tzL = np.dot(WL,A) + bL\n\tAL = sigmoid(zL)\n\tcaches.append((WL,bL,zL,AL))\n\treturn AL, caches\n\n\n#带dropout的深度神经网络\ndef forward_propagation_with_dropout(X, parameters, keep_prob = 0.8):\n\t\"\"\"\n\tX -- input dataset, of shape (input size, number of examples)\n    parameters -- python dictionary containing your parameters \"W1\", \"b1\", \"W2\", \"b2\",...,\"WL\", \"bL\"\n                    W -- weight matrix of shape (size of current layer, size of previous layer)\n                    b -- bias vector of shape (size of current layer,1)\n    keep_prob: probability of keeping a neuron active during drop-out, scalar\n    :return:\n\tAL: the output of the last Layer(y_predict)\n\tcaches: list, every element is a tuple:(W,b,z,A_pre)\n\t\"\"\"\n\tnp.random.seed(1)\n\tL = len(parameters) // 2  # number of layer\n\tA = X\n\tcaches = [(None,None,None,X,None)]  #用于存储每一层的，w,b,z,A,D第0层w,b,z用none代替\n\t# calculate from 1 to L-1 layer\n\tfor l in range(1, L):\n\t\tA_pre = A\n\t\tW = parameters[\"W\" + str(l)]\n\t\tb = parameters[\"b\" + str(l)]\n\t\tz = np.dot(W, A_pre) + b  # 计算z = wx + b\n\t\tA = relu(z)  # relu activation function\n\t\tD = np.random.rand(A.shape[0], A.shape[1]) #initialize matrix D\n\t\tD = (D < keep_prob) #convert entries of D to 0 or 1 (using keep_prob as the threshold)\n\t\tA = np.multiply(A, D) #shut down some neurons of A\n\t\tA = A / keep_prob #scale the value of neurons that haven't been shut down\n\t\tcaches.append((W, b, z, A,D))\n\t# calculate Lth layer\n\tWL = parameters[\"W\" + str(L)]\n\tbL = parameters[\"b\" + str(L)]\n\tzL = np.dot(WL, A) + bL\n\tAL = sigmoid(zL)\n\tcaches.append((WL, bL, zL, A))\n\treturn AL, caches\n\n#calculate cost function\ndef compute_cost(AL,Y):\n\t\"\"\"\n\t:param AL: 最后一层的激活值，即预测值，shape:(1,number of examples)\n\t:param Y:真实值,shape:(1, number of examples)\n\t:return:\n\t\"\"\"\n\tm = Y.shape[1]\n\tcost = 1. / m * np.nansum(np.multiply(-np.log(AL), Y) + np.multiply(-np.log(1 - AL), 1 - Y))\n\t#从数组的形状中删除单维条目，即把shape中为1的维度去掉，比如把[[[2]]]变成2\n\tcost = np.squeeze(cost)\n\treturn cost\n\n# derivation of relu\ndef relu_backward(Z):\n\t\"\"\"\n\t:param Z: the input of activation\n\t:return:\n\t\"\"\"\n\tdA = np.int64(Z > 0)\n\treturn dA\n\n#带dropout的bp\ndef backward_propagation_with_dropout(AL, Y, caches, keep_prob = 0.8):\n\t\"\"\"\n\t\tImplement the backward propagation presented in figure 2.\n\t\tArguments:\n\t\tX -- input dataset, of shape (input size, number of examples)\n\t\tY -- true \"label\" vector (containing 0 if cat, 1 if non-cat)\n\t\tcaches -- caches output from forward_propagation(),(W,b,z,pre_A)\n\n\t\tReturns:\n\t\tgradients -- A dictionary with the gradients with respect to dW,db\n\t\t\"\"\"\n\tm = Y.shape[1]\n\tL = len(caches) - 1\n\t# print(\"L:   \" + str(L))\n\t# calculate the Lth layer gradients\n\tprev_AL = caches[L - 1][3]\n\tdzL = 1. / m * (AL - Y)\n\t# print(dzL.shape)\n\t# print(prev_AL.T.shape)\n\tdWL = np.dot(dzL, prev_AL.T)\n\tdbL = np.sum(dzL, axis=1, keepdims=True)\n\tgradients = {\"dW\" + str(L): dWL, \"db\" + str(L): dbL}\n\t# calculate from L-1 to 1 layer gradients\n\tfor l in reversed(range(1, L)): # L-1,L-2,...,1\n\t\tpost_W = caches[l + 1][0]  # 要用后一层的W\n\t\tdz = dzL  # 用后一层的dz\n\n\t\tdal = np.dot(post_W.T, dz)\n\t\tDl = caches[l][4] #当前层的D\n\t\tdal = np.multiply(dal, Dl)#Apply mask Dl to shut down the same neurons as during the forward propagation\n\t\tdal = dal / keep_prob #Scale the value of neurons that haven't been shut down\n\t\tz = caches[l][2]  # 当前层的Z\n\t\tdzl = np.multiply(dal, relu_backward(z))\n\t\tprev_A = caches[l-1][3]  # 前一层的A\n\t\tdWl = np.dot(dzl, prev_A.T)\n\t\tdbl = np.sum(dzl, axis=1, keepdims=True)\n\n\t\tgradients[\"dW\" + str(l)] = dWl\n\t\tgradients[\"db\" + str(l)] = dbl\n\t\tdzL = dzl  # 更新dz\n\treturn gradients\n\ndef update_parameters(parameters, grads, learning_rate):\n\t\"\"\"\n\t:param parameters: dictionary,  W,b\n\t:param grads: dW,db\n\t:param learning_rate: alpha\n\t:return:\n\t\"\"\"\n\tL = len(parameters) // 2\n\tfor l in range(L):\n\t\tparameters[\"W\" + str(l + 1)] = parameters[\"W\" + str(l + 1)] - learning_rate * grads[\"dW\" + str(l+1)]\n\t\tparameters[\"b\" + str(l + 1)] = parameters[\"b\" + str(l + 1)] - learning_rate * grads[\"db\" + str(l+1)]\n\treturn parameters\n\ndef L_layer_model(X, Y, layer_dims, learning_rate, num_iterations,keep_prob):\n\t\"\"\"\n\t:param X:\n\t:param Y:\n\t:param layer_dims:list containing the input size and each layer size\n\t:param learning_rate:\n\t:param num_iterations:\n\t:return:\n\tparameters：final parameters:(W,b)\n\t\"\"\"\n\tcosts = []\n\t# initialize parameters\n\tparameters = initialize_parameters(layer_dims)\n\tfor i in range(0, num_iterations):\n\t\t#foward propagation\n\t\tAL,caches = forward_propagation_with_dropout(X, parameters, keep_prob)\n\t\t# calculate the cost\n\t\tcost = compute_cost(AL, Y)\n\t\tif i % 1000 == 0:\n\t\t\tprint(\"Cost after iteration {}: {}\".format(i, cost))\n\t\t\tcosts.append(cost)\n\t\t#backward propagation\n\t\tgrads = backward_propagation_with_dropout(AL, Y, caches, keep_prob)\n\t\t#update parameters\n\t\tparameters = update_parameters(parameters, grads, learning_rate)\n\tprint('length of cost')\n\tprint(len(costs))\n\tplt.clf()\n\tplt.plot(costs)  # o-:圆形\n\tplt.xlabel(\"iterations(thousand)\")  # 横坐标名字\n\tplt.ylabel(\"cost\")  # 纵坐标名字\n\tplt.show()\n\treturn parameters\n\n#predict function\ndef predict(X_test,y_test,parameters):\n\t\"\"\"\n\t:param X:\n\t:param y:\n\t:param parameters:\n\t:return:\n\t\"\"\"\n\tm = y_test.shape[1]\n\tY_prediction = np.zeros((1, m))\n\tprob, caches = forward_propagation(X_test,parameters)\n\tfor i in range(prob.shape[1]):\n\t\t# Convert probabilities A[0,i] to actual predictions p[0,i]\n\t\t### START CODE HERE ### (≈ 4 lines of code)\n\t\tif prob[0, i] > 0.5:\n\t\t\tY_prediction[0, i] = 1\n\t\telse:\n\t\t\tY_prediction[0, i] = 0\n\taccuracy = 1- np.mean(np.abs(Y_prediction - y_test))\n\treturn accuracy\n#DNN model\ndef DNN(X_train, y_train, X_test, y_test, layer_dims, learning_rate= 0.001, num_iterations=20000, keep_prob = 1.):\n\tparameters = L_layer_model(X_train, y_train, layer_dims, learning_rate, num_iterations, keep_prob)\n\taccuracy = predict(X_test,y_test,parameters)\n\treturn accuracy\nif __name__ == \"__main__\":\n\tX_data, y_data = load_breast_cancer(return_X_y=True)\n\tX_train, X_test,y_train,y_test = train_test_split(X_data, y_data, train_size=0.8,random_state=28)\n\tX_train = X_train.T\n\ty_train = y_train.reshape(y_train.shape[0], -1).T\n\tX_test = X_test.T\n\ty_test = y_test.reshape(y_test.shape[0], -1).T\n\t# X_train, y_train, X_test, y_test = load_2D_dataset()\n\taccuracy = DNN(X_train,y_train,X_test,y_test,[X_train.shape[0],10,5,1], keep_prob = 0.86)\n\tprint(accuracy)\n\n\t# X_assess, parameters = forward_propagation_with_dropout_test_case()\n\t#\n\t# A3, cache = forward_propagation_with_dropout(X_assess, parameters, keep_prob=0.7)\n\t# print(\"A3 = \" + str(A3))"
  },
  {
    "path": "deep_neural_network_with_gd.py",
    "content": "import numpy as np\nimport matplotlib.pyplot as plt\nfrom sklearn.datasets import  load_breast_cancer\nfrom sklearn.model_selection import train_test_split\n#initialize parameters(w,b)\ndef initialize_parameters(layer_dims):\n\t\"\"\"\n\t:param layer_dims: list,每一层单元的个数（维度）\n\t:return:dictionary,存储参数w1,w2,...,wL,b1,...,bL\n\t\"\"\"\n\tnp.random.seed(3)\n\tL = len(layer_dims)#the number of layers in the network\n\tparameters = {}\n\tfor l in range(1,L):\n\t\t# parameters[\"W\" + str(l)] = np.random.randn(layer_dims[l],layer_dims[l-1])*0.01\n\t\tparameters[\"W\" + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1])*np.sqrt(2/layer_dims[l-1]) # he initialization\n\t\t# parameters[\"W\" + str(l)] = np.zeros((layer_dims[l], layer_dims[l - 1])) #为了测试初始化为0的后果\n\t\t# parameters[\"W\" + str(l)] = np.random.randn(layer_dims[l], layer_dims[l - 1]) * np.sqrt(1 / layer_dims[l - 1])  # xavier initialization\n\t\tparameters[\"b\" + str(l)] = np.zeros((layer_dims[l],1))\n\treturn parameters\ndef relu(Z):\n\t\"\"\"\n\t:param Z: Output of the linear layer\n\t:return:\n\tA: output of activation\n\t\"\"\"\n\tA = np.maximum(0,Z)\n\treturn A\n#implement the activation function(ReLU and sigmoid)\ndef sigmoid(Z):\n\t\"\"\"\n\t:param Z: Output of the linear layer\n\t:return:\n\t\"\"\"\n\tA = 1 / (1 + np.exp(-Z))\n\treturn A\n\ndef forward_propagation(X, parameters):\n\t\"\"\"\n\tX -- input dataset, of shape (input size, number of examples)\n    parameters -- python dictionary containing your parameters \"W1\", \"b1\", \"W2\", \"b2\",...,\"WL\", \"bL\"\n                    W -- weight matrix of shape (size of current layer, size of previous layer)\n                    b -- bias vector of shape (size of current layer,1)\n    :return:\n\tAL: the output of the last Layer(y_predict)\n\tcaches: list, every element is a tuple:(W,b,z,A_pre)\n\t\"\"\"\n\tL = len(parameters) // 2  # number of layer\n\tA = X\n\tcaches = [(None,None,None,X)]  # 第0层(None,None,None,A0) w,b,z用none填充,下标与层数一致，用于存储每一层的，w,b,z,A\n\t# calculate from 1 to L-1 layer\n\tfor l in range(1,L):\n\t\tA_pre = A\n\t\tW = parameters[\"W\" + str(l)]\n\t\tb = parameters[\"b\" + str(l)]\n\t\tz = np.dot(W,A_pre) + b #计算z = wx + b\n\t\tA = relu(z) #relu activation function\n\t\tcaches.append((W,b,z,A))\n\t# calculate Lth layer\n\tWL = parameters[\"W\" + str(L)]\n\tbL = parameters[\"b\" + str(L)]\n\tzL = np.dot(WL,A) + bL\n\tAL = sigmoid(zL)\n\tcaches.append((WL,bL,zL,AL))\n\treturn AL, caches\n#calculate cost function\ndef compute_cost(AL,Y):\n\t\"\"\"\n\t:param AL: 最后一层的激活值，即预测值，shape:(1,number of examples)\n\t:param Y:真实值,shape:(1, number of examples)\n\t:return:\n\t\"\"\"\n\tm = Y.shape[1]\n\t# cost = -1.0/m * np.sum(Y*np.log(AL)+(1-Y)*np.log(1.0 - AL))#py中*是点乘\n\t# cost = (1. / m) * (-np.dot(Y, np.log(AL).T) - np.dot(1 - Y, np.log(1 - AL).T)) #推荐用这个，上面那个容易出错\n\tcost = 1. / m * np.nansum(np.multiply(-np.log(AL), Y) +\n\t                          np.multiply(-np.log(1 - AL), 1 - Y))\n\t#从数组的形状中删除单维条目，即把shape中为1的维度去掉，比如把[[[2]]]变成2\n\tcost = np.squeeze(cost)\n\t# print('=====================cost===================')\n\t# print(cost)\n\treturn cost\n\t\n# derivation of relu\ndef relu_backward(Z):\n\t\"\"\"\n\t:param Z: the input of activation\n\t:return:\n\t\"\"\"\n\tdA = np.int64(Z > 0)\n\treturn dA\n\ndef backward_propagation(AL, Y, caches):\n\t\"\"\"\n\tImplement the backward propagation presented in figure 2.\n\tArguments:\n\tX -- input dataset, of shape (input size, number of examples)\n\tY -- true \"label\" vector (containing 0 if cat, 1 if non-cat)\n\tcaches -- caches output from forward_propagation(),(W,b,z,pre_A)\n\n\tReturns:\n\tgradients -- A dictionary with the gradients with respect to dW,db\n\t\"\"\"\n\tm = Y.shape[1]\n\tL = len(caches) - 1\n\t# print(\"L:   \" + str(L))\n\t#calculate the Lth layer gradients\n\tprev_AL = caches[L-1][3]\n\tdzL = 1./m * (AL - Y)\n\t# print(dzL.shape)\n\t# print(prev_AL.T.shape)\n\tdWL = np.dot(dzL, prev_AL.T)\n\tdbL = np.sum(dzL, axis=1, keepdims=True)\n\tgradients = {\"dW\"+str(L):dWL, \"db\"+str(L):dbL}\n\t#calculate from L-1 to 1 layer gradients\n\tfor l in reversed(range(1,L)): # L-1,L-3,....,1\n\t\tpost_W= caches[l+1][0] #要用后一层的W\n\t\tdz = dzL #用后一层的dz\n\n\t\tdal = np.dot(post_W.T, dz)\n\t\tz = caches[l][2]#当前层的z\n\t\tdzl = np.multiply(dal, relu_backward(z))\n\t\tprev_A = caches[l-1][3]#前一层的A\n\t\tdWl = np.dot(dzl, prev_A.T)\n\t\tdbl = np.sum(dzl, axis=1, keepdims=True)\n\n\t\tgradients[\"dW\" + str(l)] = dWl\n\t\tgradients[\"db\" + str(l)] = dbl\n\t\tdzL = dzl #更新dz\n\treturn gradients\n\ndef update_parameters(parameters, grads, learning_rate):\n\t\"\"\"\n\t:param parameters: dictionary,  W,b\n\t:param grads: dW,db\n\t:param learning_rate: alpha\n\t:return:\n\t\"\"\"\n\tL = len(parameters) // 2\n\tfor l in range(L):\n\t\tparameters[\"W\" + str(l + 1)] = parameters[\"W\" + str(l + 1)] - learning_rate * grads[\"dW\" + str(l+1)]\n\t\tparameters[\"b\" + str(l + 1)] = parameters[\"b\" + str(l + 1)] - learning_rate * grads[\"db\" + str(l+1)]\n\treturn parameters\n\n\ndef random_mini_batches(X, Y, mini_batch_size = 64, seed=1):\n\t\"\"\"\n\tCreates a list of random minibatches from (X, Y)\n\tArguments:\n\tX -- input data, of shape (input size, number of examples)\n\tY -- true \"label\" vector (1 for blue dot / 0 for red dot), of shape (1, number of examples)\n\tmini_batch_size -- size of the mini-batches, integer\n\n\tReturns:\n\tmini_batches -- list of synchronous (mini_batch_X, mini_batch_Y)\n\t\"\"\"\n\tnp.random.seed(seed)\n\tm = X.shape[1]  # number of training examples\n\tmini_batches = []\n\n\t# Step 1: Shuffle (X, Y)\n\tpermutation = list(np.random.permutation(m))\n\tshuffled_X = X[:, permutation]\n\tshuffled_Y = Y[:, permutation].reshape((1, m))\n\n\t# Step 2: Partition (shuffled_X, shuffled_Y). Minus the end case.\n\tnum_complete_minibatches = m // mini_batch_size  # number of mini batches of size mini_batch_size in your partitionning\n\tfor k in range(0, num_complete_minibatches):\n\t\tmini_batch_X = shuffled_X[:, k * mini_batch_size: (k + 1) * mini_batch_size]\n\t\tmini_batch_Y = shuffled_Y[:, k * mini_batch_size: (k + 1) * mini_batch_size]\n\t\tmini_batch = (mini_batch_X, mini_batch_Y)\n\t\tmini_batches.append(mini_batch)\n\n\t# Handling the end case (last mini-batch < mini_batch_size)\n\tif m % mini_batch_size != 0:\n\t\tmini_batch_X = shuffled_X[:, num_complete_minibatches * mini_batch_size: m]\n\t\tmini_batch_Y = shuffled_Y[:, num_complete_minibatches * mini_batch_size: m]\n\t\tmini_batch = (mini_batch_X, mini_batch_Y)\n\t\tmini_batches.append(mini_batch)\n\n\treturn mini_batches\n\ndef L_layer_model(X, Y, layer_dims, learning_rate, num_iterations, gradient_descent = 'bgd',mini_batch_size = 64):\n\t\"\"\"\n\t:param X:\n\t:param Y:\n\t:param layer_dims:list containing the input size and each layer size\n\t:param learning_rate:\n\t:param num_iterations:\n\t:return:\n\tparameters：final parameters:(W,b)\n\t\"\"\"\n\tm = Y.shape[1]\n\tcosts = []\n\t# initialize parameters\n\tparameters = initialize_parameters(layer_dims)\n\tif gradient_descent =='bgd':\n\t\tfor i in range(0, num_iterations):\n\t\t\t#foward propagation\n\t\t\tAL,caches = forward_propagation(X, parameters)\n\t\t\t# calculate the cost\n\t\t\tcost = compute_cost(AL, Y)\n\t\t\tif i % 1000 == 0:\n\t\t\t\tprint(\"Cost after iteration {}: {}\".format(i, cost))\n\t\t\t\tcosts.append(cost)\n\t\t\t#backward propagation\n\t\t\tgrads = backward_propagation(AL, Y, caches)\n\t\t\t#update parameters\n\t\t\tparameters = update_parameters(parameters, grads, learning_rate)\n\telif gradient_descent == 'sgd':\n\t\tnp.random.seed(3)\n\t\t# 把数据集打乱，这个很重要\n\t\tpermutation = list(np.random.permutation(m))\n\t\tshuffled_X = X[:, permutation]\n\t\tshuffled_Y = Y[:, permutation].reshape((1, m))\n\t\tfor i in range(0, num_iterations):\n\t\t\tfor j in range(0, m):  # 每次训练一个样本\n\t\t\t\t# Forward propagation\n\t\t\t\tAL,caches = forward_propagation(shuffled_X[:, j].reshape(-1,1), parameters)\n\t\t\t\t# Compute cost\n\t\t\t\tcost = compute_cost(AL, shuffled_Y[:, j].reshape(1,1))\n\t\t\t\t# Backward propagation\n\t\t\t\tgrads = backward_propagation(AL, shuffled_Y[:,j].reshape(1,1), caches)\n\t\t\t\t# Update parameters.\n\t\t\t\tparameters = update_parameters(parameters, grads, learning_rate)\n\t\t\t\tif j % 20 == 0:\n\t\t\t\t\tprint(\"example size {}: {}\".format(j, cost))\n\t\t\t\t\tcosts.append(cost)\n\telif gradient_descent == 'mini-batch':\n\t\tseed = 0\n\t\tfor i in range(0, num_iterations):\n\t\t\t# Define the random minibatches. We increment the seed to reshuffle differently the dataset after each epoch\n\t\t\tseed = seed + 1\n\t\t\tminibatches = random_mini_batches(X, Y, mini_batch_size, seed)\n\t\t\tfor minibatch in minibatches:\n\t\t\t\t# Select a minibatch\n\t\t\t\t(minibatch_X, minibatch_Y) = minibatch\n\t\t\t\t# Forward propagation\n\t\t\t\tAL, caches = forward_propagation(minibatch_X, parameters)\n\t\t\t\t# Compute cost\n\t\t\t\tcost = compute_cost(AL, minibatch_Y)\n\t\t\t\t# Backward propagation\n\t\t\t\tgrads = backward_propagation(AL, minibatch_Y, caches)\n\t\t\t\tparameters = update_parameters(parameters, grads, learning_rate)\n\t\t\tif i % 100 == 0:\n\t\t\t\tprint(\"Cost after iteration {}: {}\".format(i, cost))\n\t\t\t\tcosts.append(cost)\n\tprint('length of cost')\n\tprint(len(costs))\n\tplt.clf()\n\tplt.plot(costs)\n\tplt.xlabel(\"iterations(hundred)\")  # 横坐标名字\n\tplt.ylabel(\"cost\")  # 纵坐标名字\n\tplt.show()\n\treturn parameters\n\n#predict function\ndef predict(X_test,y_test,parameters):\n\t\"\"\"\n\t:param X:\n\t:param y:\n\t:param parameters:\n\t:return:\n\t\"\"\"\n\tm = y_test.shape[1]\n\tY_prediction = np.zeros((1, m))\n\tprob, caches = forward_propagation(X_test,parameters)\n\tfor i in range(prob.shape[1]):\n\t\t# Convert probabilities A[0,i] to actual predictions p[0,i]\n\t\tif prob[0, i] > 0.5:\n\t\t\tY_prediction[0, i] = 1\n\t\telse:\n\t\t\tY_prediction[0, i] = 0\n\taccuracy = 1- np.mean(np.abs(Y_prediction - y_test))\n\treturn accuracy\n#DNN model\ndef DNN(X_train, y_train, X_test, y_test, layer_dims, learning_rate= 0.0006, num_iterations=30000, gradient_descent = 'bgd',mini_batch_size = 64):\n\tparameters = L_layer_model(X_train, y_train, layer_dims, learning_rate, num_iterations,gradient_descent,mini_batch_size)\n\taccuracy = predict(X_test,y_test,parameters)\n\treturn accuracy\n\nif __name__ == \"__main__\":\n\tX_data, y_data = load_breast_cancer(return_X_y=True)\n\tX_train, X_test,y_train,y_test = train_test_split(X_data, y_data, train_size=0.8,random_state=28)\n\tX_train = X_train.T\n\ty_train = y_train.reshape(y_train.shape[0], -1).T\n\tX_test = X_test.T\n\ty_test = y_test.reshape(y_test.shape[0], -1).T\n\t#use bgd\n\taccuracy = DNN(X_train,y_train,X_test,y_test,[X_train.shape[0],10,5,1])\n\tprint(accuracy)\n\t#use sgd\n\taccuracy = DNN(X_train, y_train, X_test, y_test, [X_train.shape[0], 10, 5, 1],num_iterations=5, gradient_descent = 'sgd')\n\tprint(accuracy)\n\t#mini-batch\n\taccuracy = DNN(X_train, y_train, X_test, y_test, [X_train.shape[0], 10, 5, 1], num_iterations=10000,gradient_descent='mini-batch')\n\tprint(accuracy)"
  },
  {
    "path": "deep_neural_network_with_optimizers.py",
    "content": "import numpy as np\nimport matplotlib.pyplot as plt\nfrom sklearn.datasets import  load_breast_cancer\nfrom sklearn.model_selection import train_test_split\n#initialize parameters(w,b)\ndef initialize_parameters(layer_dims):\n\t\"\"\"\n\t:param layer_dims: list,每一层单元的个数（维度）\n\t:return:dictionary,存储参数w1,w2,...,wL,b1,...,bL\n\t\"\"\"\n\tnp.random.seed(3)\n\tL = len(layer_dims)#the number of layers in the network\n\tparameters = {}\n\tfor l in range(1,L):\n\t\t# parameters[\"W\" + str(l)] = np.random.randn(layer_dims[l],layer_dims[l-1])*0.01\n\t\tparameters[\"W\" + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1])*np.sqrt(2/layer_dims[l-1]) # he initialization\n\t\t# parameters[\"W\" + str(l)] = np.zeros((layer_dims[l], layer_dims[l - 1])) #为了测试初始化为0的后果\n\t\t# parameters[\"W\" + str(l)] = np.random.randn(layer_dims[l], layer_dims[l - 1]) * np.sqrt(1 / layer_dims[l - 1])  # xavier initialization\n\t\tparameters[\"b\" + str(l)] = np.zeros((layer_dims[l],1))\n\treturn parameters\ndef relu(Z):\n\t\"\"\"\n\t:param Z: Output of the linear layer\n\t:return:\n\tA: output of activation\n\t\"\"\"\n\tA = np.maximum(0,Z)\n\treturn A\n#implement the activation function(ReLU and sigmoid)\ndef sigmoid(Z):\n\t\"\"\"\n\t:param Z: Output of the linear layer\n\t:return:\n\t\"\"\"\n\tA = 1 / (1 + np.exp(-Z))\n\treturn A\n\ndef forward_propagation(X, parameters):\n\t\"\"\"\n\tX -- input dataset, of shape (input size, number of examples)\n    parameters -- python dictionary containing your parameters \"W1\", \"b1\", \"W2\", \"b2\",...,\"WL\", \"bL\"\n                    W -- weight matrix of shape (size of current layer, size of previous layer)\n                    b -- bias vector of shape (size of current layer,1)\n    :return:\n\tAL: the output of the last Layer(y_predict)\n\tcaches: list, every element is a tuple:(W,b,z,A_pre)\n\t\"\"\"\n\tL = len(parameters) // 2  # number of layer\n\tA = X\n\tcaches = [(None,None,None,X)]  # 第0层(None,None,None,A0) w,b,z用none填充,下标与层数一致，用于存储每一层的，w,b,z,A\n\t# calculate from 1 to L-1 layer\n\tfor l in range(1,L):\n\t\tA_pre = A\n\t\tW = parameters[\"W\" + str(l)]\n\t\tb = parameters[\"b\" + str(l)]\n\t\tz = np.dot(W,A_pre) + b #计算z = wx + b\n\t\tA = relu(z) #relu activation function\n\t\tcaches.append((W,b,z,A))\n\t# calculate Lth layer\n\tWL = parameters[\"W\" + str(L)]\n\tbL = parameters[\"b\" + str(L)]\n\tzL = np.dot(WL,A) + bL\n\tAL = sigmoid(zL)\n\tcaches.append((WL,bL,zL,AL))\n\treturn AL, caches\n#calculate cost function\ndef compute_cost(AL,Y):\n\t\"\"\"\n\t:param AL: 最后一层的激活值，即预测值，shape:(1,number of examples)\n\t:param Y:真实值,shape:(1, number of examples)\n\t:return:\n\t\"\"\"\n\tm = Y.shape[1]\n\t# cost = -1.0/m * np.sum(Y*np.log(AL)+(1-Y)*np.log(1.0 - AL))#py中*是点乘\n\t# cost = (1. / m) * (-np.dot(Y, np.log(AL).T) - np.dot(1 - Y, np.log(1 - AL).T)) #推荐用这个，上面那个容易出错\n\tcost = 1. / m * np.nansum(np.multiply(-np.log(AL), Y) +\n\t                          np.multiply(-np.log(1 - AL), 1 - Y))\n\t#从数组的形状中删除单维条目，即把shape中为1的维度去掉，比如把[[[2]]]变成2\n\tcost = np.squeeze(cost)\n\t# print('=====================cost===================')\n\t# print(cost)\n\treturn cost\n\n# derivation of relu\ndef relu_backward(Z):\n\t\"\"\"\n\t:param Z: the input of activation\n\t:return:\n\t\"\"\"\n\tdA = np.int64(Z > 0)\n\treturn dA\n\ndef backward_propagation(AL, Y, caches):\n\t\"\"\"\n\tImplement the backward propagation presented in figure 2.\n\tArguments:\n\tX -- input dataset, of shape (input size, number of examples)\n\tY -- true \"label\" vector (containing 0 if cat, 1 if non-cat)\n\tcaches -- caches output from forward_propagation(),(W,b,z,pre_A)\n\n\tReturns:\n\tgradients -- A dictionary with the gradients with respect to dW,db\n\t\"\"\"\n\tm = Y.shape[1]\n\tL = len(caches) - 1\n\t# print(\"L:   \" + str(L))\n\t#calculate the Lth layer gradients\n\tprev_AL = caches[L-1][3]\n\tdzL = 1./m * (AL - Y)\n\t# print(dzL.shape)\n\t# print(prev_AL.T.shape)\n\tdWL = np.dot(dzL, prev_AL.T)\n\tdbL = np.sum(dzL, axis=1, keepdims=True)\n\tgradients = {\"dW\"+str(L):dWL, \"db\"+str(L):dbL}\n\t#calculate from L-1 to 1 layer gradients\n\tfor l in reversed(range(1,L)): # L-1,L-3,....,1\n\t\tpost_W= caches[l+1][0] #要用后一层的W\n\t\tdz = dzL #用后一层的dz\n\n\t\tdal = np.dot(post_W.T, dz)\n\t\tz = caches[l][2]#当前层的z\n\t\tdzl = np.multiply(dal, relu_backward(z))\n\t\tprev_A = caches[l-1][3]#前一层的A\n\t\tdWl = np.dot(dzl, prev_A.T)\n\t\tdbl = np.sum(dzl, axis=1, keepdims=True)\n\n\t\tgradients[\"dW\" + str(l)] = dWl\n\t\tgradients[\"db\" + str(l)] = dbl\n\t\tdzL = dzl #更新dz\n\treturn gradients\n\ndef update_parameters_with_gd(parameters, grads, learning_rate):\n\t\"\"\"\n\t:param parameters: dictionary,  W,b\n\t:param grads: dW,db\n\t:param learning_rate: alpha\n\t:return:\n\t\"\"\"\n\tL = len(parameters) // 2\n\tfor l in range(L):\n\t\tparameters[\"W\" + str(l + 1)] = parameters[\"W\" + str(l + 1)] - learning_rate * grads[\"dW\" + str(l+1)]\n\t\tparameters[\"b\" + str(l + 1)] = parameters[\"b\" + str(l + 1)] - learning_rate * grads[\"db\" + str(l+1)]\n\treturn parameters\n\n\ndef random_mini_batches(X, Y, mini_batch_size = 64, seed=1):\n\t\"\"\"\n\tCreates a list of random minibatches from (X, Y)\n\tArguments:\n\tX -- input data, of shape (input size, number of examples)\n\tY -- true \"label\" vector (1 for blue dot / 0 for red dot), of shape (1, number of examples)\n\tmini_batch_size -- size of the mini-batches, integer\n\n\tReturns:\n\tmini_batches -- list of synchronous (mini_batch_X, mini_batch_Y)\n\t\"\"\"\n\tnp.random.seed(seed)\n\tm = X.shape[1]  # number of training examples\n\tmini_batches = []\n\n\t# Step 1: Shuffle (X, Y)\n\tpermutation = list(np.random.permutation(m))\n\tshuffled_X = X[:, permutation]\n\tshuffled_Y = Y[:, permutation].reshape((1, m))\n\n\t# Step 2: Partition (shuffled_X, shuffled_Y). Minus the end case.\n\tnum_complete_minibatches = m // mini_batch_size  # number of mini batches of size mini_batch_size in your partitionning\n\tfor k in range(0, num_complete_minibatches):\n\t\tmini_batch_X = shuffled_X[:, k * mini_batch_size: (k + 1) * mini_batch_size]\n\t\tmini_batch_Y = shuffled_Y[:, k * mini_batch_size: (k + 1) * mini_batch_size]\n\t\tmini_batch = (mini_batch_X, mini_batch_Y)\n\t\tmini_batches.append(mini_batch)\n\n\t# Handling the end case (last mini-batch < mini_batch_size)\n\tif m % mini_batch_size != 0:\n\t\tmini_batch_X = shuffled_X[:, num_complete_minibatches * mini_batch_size: m]\n\t\tmini_batch_Y = shuffled_Y[:, num_complete_minibatches * mini_batch_size: m]\n\t\tmini_batch = (mini_batch_X, mini_batch_Y)\n\t\tmini_batches.append(mini_batch)\n\n\treturn mini_batches\n\n\ndef initialize_velocity(parameters):\n\t\"\"\"\n\tInitializes the velocity as a python dictionary with:\n\t\t\t\t- keys: \"dW1\", \"db1\", ..., \"dWL\", \"dbL\"\n\t\t\t\t- values: numpy arrays of zeros of the same shape as the corresponding gradients/parameters.\n\tArguments:\n\tparameters -- python dictionary containing your parameters.\n\t\t\t\t\tparameters['W' + str(l)] = Wl\n\t\t\t\t\tparameters['b' + str(l)] = bl\n\n\tReturns:\n\tv -- python dictionary containing the current velocity.\n\t\t\t\t\tv['dW' + str(l)] = velocity of dWl\n\t\t\t\t\tv['db' + str(l)] = velocity of dbl\n\t\"\"\"\n\tL = len(parameters) // 2  # number of layers in the neural networks\n\tv = {}\n\t# Initialize velocity\n\tfor l in range(L):\n\t\tv[\"dW\" + str(l + 1)] = np.zeros(parameters[\"W\" + str(l + 1)].shape)\n\t\tv[\"db\" + str(l + 1)] = np.zeros(parameters[\"b\" + str(l + 1)].shape)\n\treturn v\n\n#momentum\ndef update_parameters_with_momentum(parameters, grads, v, beta, learning_rate):\n\t\"\"\"\n\tUpdate parameters using Momentum\n\tArguments:\n\tparameters -- python dictionary containing your parameters:\n\t\t\t\t\tparameters['W' + str(l)] = Wl\n\t\t\t\t\tparameters['b' + str(l)] = bl\n\tgrads -- python dictionary containing your gradients for each parameters:\n\t\t\t\t\tgrads['dW' + str(l)] = dWl\n\t\t\t\t\tgrads['db' + str(l)] = dbl\n\tv -- python dictionary containing the current velocity:\n\t\t\t\t\tv['dW' + str(l)] = ...\n\t\t\t\t\tv['db' + str(l)] = ...\n\tbeta -- the momentum hyperparameter, scalar\n\tlearning_rate -- the learning rate, scalar\n\n\tReturns:\n\tparameters -- python dictionary containing your updated parameters\n\n\t'''\n\tVdW = beta * VdW + (1-beta) * dW\n\tVdb = beta * Vdb + (1-beta) * db\n\tW = W - learning_rate * VdW\n\tb = b - learning_rate * Vdb\n\t'''\n\t\"\"\"\n\n\n\tL = len(parameters) // 2  # number of layers in the neural networks\n\n\t# Momentum update for each parameter\n\tfor l in range(L):\n\t\t# compute velocities\n\t\tv[\"dW\" + str(l + 1)] = beta * v[\"dW\" + str(l + 1)] + (1 - beta) * grads['dW' + str(l + 1)]\n\t\tv[\"db\" + str(l + 1)] = beta * v[\"db\" + str(l + 1)] + (1 - beta) * grads['db' + str(l + 1)]\n\t\t# update parameters\n\t\tparameters[\"W\" + str(l + 1)] = parameters[\"W\" + str(l + 1)] - learning_rate * v[\"dW\" + str(l + 1)]\n\t\tparameters[\"b\" + str(l + 1)] = parameters[\"b\" + str(l + 1)] - learning_rate * v[\"db\" + str(l + 1)]\n\n\treturn parameters\n\n#nesterov momentum\ndef update_parameters_with_nesterov_momentum(parameters, grads, v, beta, learning_rate):\n\t\"\"\"\n\tUpdate parameters using Momentum\n\tArguments:\n\tparameters -- python dictionary containing your parameters:\n\t\t\t\t\tparameters['W' + str(l)] = Wl\n\t\t\t\t\tparameters['b' + str(l)] = bl\n\tgrads -- python dictionary containing your gradients for each parameters:\n\t\t\t\t\tgrads['dW' + str(l)] = dWl\n\t\t\t\t\tgrads['db' + str(l)] = dbl\n\tv -- python dictionary containing the current velocity:\n\t\t\t\t\tv['dW' + str(l)] = ...\n\t\t\t\t\tv['db' + str(l)] = ...\n\tbeta -- the momentum hyperparameter, scalar\n\tlearning_rate -- the learning rate, scalar\n\n\tReturns:\n\tparameters -- python dictionary containing your updated parameters\n\tv -- python dictionary containing your updated velocities\n\n\t'''\n\tVdW = beta * VdW - learning_rate * dW\n\tVdb = beta * Vdb - learning_rate * db\n\tW = W + beta * VdW - learning_rate * dW\n\tb = b + beta * Vdb - learning_rate * db\n\t'''\n\t\"\"\"\n\n\tL = len(parameters) // 2  # number of layers in the neural networks\n\n\t# Momentum update for each parameter\n\tfor l in range(L):\n\t\t# compute velocities\n\t\tv[\"dW\" + str(l + 1)] = beta * v[\"dW\" + str(l + 1)] - learning_rate * grads['dW' + str(l + 1)]\n\t\tv[\"db\" + str(l + 1)] = beta * v[\"db\" + str(l + 1)] - learning_rate * grads['db' + str(l + 1)]\n\t\t# update parameters\n\t\tparameters[\"W\" + str(l + 1)] += beta * v[\"dW\" + str(l + 1)]- learning_rate * grads['dW' + str(l + 1)]\n\t\tparameters[\"b\" + str(l + 1)] += beta * v[\"db\" + str(l + 1)] - learning_rate * grads[\"db\" + str(l + 1)]\n\n\treturn parameters\n\n\n#AdaGrad initialization\ndef initialize_adagrad(parameters):\n\t\"\"\"\n\tInitializes the velocity as a python dictionary with:\n\t\t\t\t- keys: \"dW1\", \"db1\", ..., \"dWL\", \"dbL\"\n\t\t\t\t- values: numpy arrays of zeros of the same shape as the corresponding gradients/parameters.\n\tArguments:\n\tparameters -- python dictionary containing your parameters.\n\t\t\t\t\tparameters['W' + str(l)] = Wl\n\t\t\t\t\tparameters['b' + str(l)] = bl\n\n\tReturns:\n\tGt -- python dictionary containing sum of the squares of the gradients up to step t.\n\t\t\t\t\tG['dW' + str(l)] = sum of the squares of the gradients up to dwl\n\t\t\t\t\tG['db' + str(l)] = sum of the squares of the gradients up to db1\n\t\"\"\"\n\tL = len(parameters) // 2  # number of layers in the neural networks\n\tG = {}\n\t# Initialize velocity\n\tfor l in range(L):\n\t\tG[\"dW\" + str(l + 1)] = np.zeros(parameters[\"W\" + str(l + 1)].shape)\n\t\tG[\"db\" + str(l + 1)] = np.zeros(parameters[\"b\" + str(l + 1)].shape)\n\treturn G\n\n#AdaGrad\ndef update_parameters_with_adagrad(parameters, grads, G, learning_rate, epsilon = 1e-7):\n\t\"\"\"\n\tUpdate parameters using Momentum\n\tArguments:\n\tparameters -- python dictionary containing your parameters:\n\t\t\t\t\tparameters['W' + str(l)] = Wl\n\t\t\t\t\tparameters['b' + str(l)] = bl\n\tgrads -- python dictionary containing your gradients for each parameters:\n\t\t\t\t\tgrads['dW' + str(l)] = dWl\n\t\t\t\t\tgrads['db' + str(l)] = dbl\n\tG -- python dictionary containing the current velocity:\n\t\t\t\t\tG['dW' + str(l)] = ...\n\t\t\t\t\tG['db' + str(l)] = ...\n\tlearning_rate -- the learning rate, scalar\n\tepsilon -- hyperparameter preventing division by zero in adagrad updates\n\n\tReturns:\n\tparameters -- python dictionary containing your updated parameters\n\n\t'''\n\tGW += (dW)^2\n\tW -= learning_rate/sqrt(GW + epsilon)*dW\n\tGb += (db)^2\n\tb -= learning_rate/sqrt(Gb + epsilon)*db\n\t'''\n\t\"\"\"\n\n\tL = len(parameters) // 2  # number of layers in the neural networks\n\n\t# Momentum update for each parameter\n\tfor l in range(L):\n\t\t# compute velocities\n\t\tG[\"dW\" + str(l + 1)] += grads['dW' + str(l + 1)]**2\n\t\tG[\"db\" + str(l + 1)] += grads['db' + str(l + 1)]**2\n\t\t# update parameters\n\t\tparameters[\"W\" + str(l + 1)] -= learning_rate / (np.sqrt(G[\"dW\" + str(l + 1)]) + epsilon) * grads['dW' + str(l + 1)]\n\t\tparameters[\"b\" + str(l + 1)] -= learning_rate / (np.sqrt(G[\"db\" + str(l + 1)]) + epsilon) * grads['db' + str(l + 1)]\n\n\treturn parameters\n\n\n#initialize_adadelta\ndef initialize_adadelta(parameters):\n\t\"\"\"\n\tInitializes s and delta as two python dictionaries with:\n\t\t\t\t- keys: \"dW1\", \"db1\", ..., \"dWL\", \"dbL\"\n\t\t\t\t- values: numpy arrays of zeros of the same shape as the corresponding gradients/parameters.\n\n\tArguments:\n\tparameters -- python dictionary containing your parameters.\n\t\t\t\t\tparameters[\"W\" + str(l)] = Wl\n\t\t\t\t\tparameters[\"b\" + str(l)] = bl\n\n\tReturns:\n\ts -- python dictionary that will contain the exponentially weighted average of the squared gradient of dw\n\t\t\t\t\ts[\"dW\" + str(l)] = ...\n\t\t\t\t\ts[\"db\" + str(l)] = ...\n\tv -- python dictionary that will contain the RMS\n\t\t\t\tv[\"dW\" + str(l)] = ...\n\t\t\t\tv[\"db\" + str(l)] = ...\n\tdelta -- python dictionary that will contain the exponentially weighted average of the squared gradient of delta_w\n\t\t\t\t\tdelta[\"dW\" + str(l)] = ...\n\t\t\t\t\tdelta[\"db\" + str(l)] = ...\n\n\t\"\"\"\n\n\tL = len(parameters) // 2  # number of layers in the neural networks\n\ts = {}\n\tv = {}\n\tdelta = {}\n\t# Initialize s, v, delta. Input: \"parameters\". Outputs: \"s, v, delta\".\n\tfor l in range(L):\n\t\ts[\"dW\" + str(l + 1)] = np.zeros(parameters[\"W\" + str(l + 1)].shape)\n\t\ts[\"db\" + str(l + 1)] = np.zeros(parameters[\"b\" + str(l + 1)].shape)\n\t\tv[\"dW\" + str(l + 1)] = np.zeros(parameters[\"W\" + str(l + 1)].shape)\n\t\tv[\"db\" + str(l + 1)] = np.zeros(parameters[\"b\" + str(l + 1)].shape)\n\t\tdelta[\"dW\" + str(l + 1)] = np.zeros(parameters[\"W\" + str(l + 1)].shape)\n\t\tdelta[\"db\" + str(l + 1)] = np.zeros(parameters[\"b\" + str(l + 1)].shape)\n\n\treturn s, v, delta\n\n#adadelta\ndef update_parameters_with_adadelta(parameters, grads, rho, s, v, delta, epsilon = 1e-6):\n\t\"\"\"\n\tUpdate parameters using Momentum\n\tArguments:\n\tparameters -- python dictionary containing your parameters:\n\t\t\t\t\tparameters['W' + str(l)] = Wl\n\t\t\t\t\tparameters['b' + str(l)] = bl\n\tgrads -- python dictionary containing your gradients for each parameters:\n\t\t\t\t\tgrads['dW' + str(l)] = dWl\n\t\t\t\t\tgrads['db' + str(l)] = dbl\n\trho -- decay constant similar to that used in the momentum method, scalar\n\ts -- python dictionary containing the current velocity:\n\t\t\t\t\ts['dW' + str(l)] = ...\n\t\t\t\t\ts['db' + str(l)] = ...\n\tdelta -- python dictionary containing the current RMS:\n\t\t\t\t\tdelta['dW' + str(l)] = ...\n\t\t\t\t\tdelta['db' + str(l)] = ...\n\n\tepsilon -- hyperparameter preventing division by zero in adagrad updates\n\n\tReturns:\n\tparameters -- python dictionary containing your updated parameters\n\n\t'''\n\tSdw = rho*Sdw + (1 - rho)*(dW)^2\n\tSdb = rho*Sdb + (1 - rho)*(db)^2\n\tVdw = sqrt((delta_w + epsilon) / (Sdw + epsilon))*dW\n\tVdb = sqrt((delta_b + epsilon) / (Sdb + epsilon))*dW\n\tW -= Vdw\n\tb -= Vdb\n\tdelta_w = rho*delta_w + (1 - rho)*(Vdw)^2\n\tdelta_b = rho*delta_b + (1 - rho)*(Vdb)^2\n\t'''\n\t\"\"\"\n\n\tL = len(parameters) // 2  # number of layers in the neural networks\n\t# adadelta update for each parameter\n\tfor l in range(L):\n\t\t# compute s\n\t\ts[\"dW\" + str(l + 1)] = rho * s[\"dW\" + str(l + 1)] + (1 - rho)*grads['dW' + str(l + 1)]**2\n\t\ts[\"db\" + str(l + 1)] = rho * s[\"db\" + str(l + 1)] + (1 - rho)*grads['db' + str(l + 1)]**2\n\t\t#compute RMS\n\t\tv[\"dW\" + str(l + 1)] = np.sqrt((delta[\"dW\" + str(l + 1)] + epsilon) / (s[\"dW\" + str(l + 1)] + epsilon)) * grads['dW' + str(l + 1)]\n\t\tv[\"db\" + str(l + 1)] = np.sqrt((delta[\"db\" + str(l + 1)] + epsilon) / (s[\"db\" + str(l + 1)] + epsilon)) * grads['db' + str(l + 1)]\n\t\t# update parameters\n\t\tparameters[\"W\" + str(l + 1)] -= v[\"dW\" + str(l + 1)]\n\t\tparameters[\"b\" + str(l + 1)] -= v[\"db\" + str(l + 1)]\n\t\t#compute delta\n\t\tdelta[\"dW\" + str(l + 1)] = rho * delta[\"dW\" + str(l + 1)] + (1 - rho) * v[\"dW\" + str(l + 1)] ** 2\n\t\tdelta[\"db\" + str(l + 1)] = rho * delta[\"db\" + str(l + 1)] + (1 - rho) * v[\"db\" + str(l + 1)] ** 2\n\n\treturn parameters\n\n#RMSprop\ndef update_parameters_with_rmsprop(parameters, grads, s, beta = 0.9, learning_rate = 0.01, epsilon = 1e-6):\n\t\"\"\"\n\tUpdate parameters using Momentum\n\tArguments:\n\tparameters -- python dictionary containing your parameters:\n\t\t\t\t\tparameters['W' + str(l)] = Wl\n\t\t\t\t\tparameters['b' + str(l)] = bl\n\tgrads -- python dictionary containing your gradients for each parameters:\n\t\t\t\t\tgrads['dW' + str(l)] = dWl\n\t\t\t\t\tgrads['db' + str(l)] = dbl\n\ts -- python dictionary containing the current velocity:\n\t\t\t\t\tv['dW' + str(l)] = ...\n\t\t\t\t\tv['db' + str(l)] = ...\n\tbeta -- the momentum hyperparameter, scalar\n\tlearning_rate -- the learning rate, scalar\n\n\tReturns:\n\tparameters -- python dictionary containing your updated parameters\n\t'''\n\tSdW = beta * SdW + (1-beta) * (dW)^2\n\tsdb = beta * Sdb + (1-beta) * (db)^2\n\tW = W - learning_rate * dW/sqrt(SdW + epsilon)\n\tb = b - learning_rate * db/sqrt(Sdb + epsilon)\n\t'''\n\t\"\"\"\n\tL = len(parameters) // 2  # number of layers in the neural networks\n\t# rmsprop update for each parameter\n\tfor l in range(L):\n\t\t# compute velocities\n\t\ts[\"dW\" + str(l + 1)] = beta * s[\"dW\" + str(l + 1)] + (1 - beta) * grads['dW' + str(l + 1)]**2\n\t\ts[\"db\" + str(l + 1)] = beta * s[\"db\" + str(l + 1)] + (1 - beta) * grads['db' + str(l + 1)]**2\n\t\t# update parameters\n\t\tparameters[\"W\" + str(l + 1)] = parameters[\"W\" + str(l + 1)] - learning_rate * grads['dW' + str(l + 1)] / np.sqrt(s[\"dW\" + str(l + 1)] + epsilon)\n\t\tparameters[\"b\" + str(l + 1)] = parameters[\"b\" + str(l + 1)] - learning_rate * grads['db' + str(l + 1)] / np.sqrt(s[\"db\" + str(l + 1)] + epsilon)\n\n\treturn parameters\n\n#initialize adam\ndef initialize_adam(parameters):\n\t\"\"\"\n\tInitializes v and s as two python dictionaries with:\n\t\t\t\t- keys: \"dW1\", \"db1\", ..., \"dWL\", \"dbL\"\n\t\t\t\t- values: numpy arrays of zeros of the same shape as the corresponding gradients/parameters.\n\tArguments:\n\tparameters -- python dictionary containing your parameters.\n\t\t\t\t\tparameters[\"W\" + str(l)] = Wl\n\t\t\t\t\tparameters[\"b\" + str(l)] = bl\n\tReturns:\n\tv -- python dictionary that will contain the exponentially weighted average of the gradient.\n\t\t\t\t\tv[\"dW\" + str(l)] = ...\n\t\t\t\t\tv[\"db\" + str(l)] = ...\n\ts -- python dictionary that will contain the exponentially weighted average of the squared gradient.\n\t\t\t\t\ts[\"dW\" + str(l)] = ...\n\t\t\t\t\ts[\"db\" + str(l)] = ...\n\n\t\"\"\"\n\tL = len(parameters) // 2  # number of layers in the neural networks\n\tv = {}\n\ts = {}\n\t# Initialize v, s. Input: \"parameters\". Outputs: \"v, s\".\n\tfor l in range(L):\n\t\tv[\"dW\" + str(l + 1)] = np.zeros(parameters[\"W\" + str(l + 1)].shape)\n\t\tv[\"db\" + str(l + 1)] = np.zeros(parameters[\"b\" + str(l + 1)].shape)\n\t\ts[\"dW\" + str(l + 1)] = np.zeros(parameters[\"W\" + str(l + 1)].shape)\n\t\ts[\"db\" + str(l + 1)] = np.zeros(parameters[\"b\" + str(l + 1)].shape)\n\n\treturn v, s\n\n#adam\ndef update_parameters_with_adam(parameters, grads, v, s, t, learning_rate=0.01, beta1=0.9, beta2=0.999, epsilon=1e-8):\n\t\"\"\"\n\tUpdate parameters using Adam\n\n\tArguments:\n\tparameters -- python dictionary containing your parameters:\n\t\t\t\t\tparameters['W' + str(l)] = Wl\n\t\t\t\t\tparameters['b' + str(l)] = bl\n\tgrads -- python dictionary containing your gradients for each parameters:\n\t\t\t\t\tgrads['dW' + str(l)] = dWl\n\t\t\t\t\tgrads['db' + str(l)] = dbl\n\tv -- Adam variable, moving average of the first gradient, python dictionary\n\ts -- Adam variable, moving average of the squared gradient, python dictionary\n\tlearning_rate -- the learning rate, scalar.\n\tbeta1 -- Exponential decay hyperparameter for the first moment estimates\n\tbeta2 -- Exponential decay hyperparameter for the second moment estimates\n\tepsilon -- hyperparameter preventing division by zero in Adam updates\n\n\tReturns:\n\tparameters -- python dictionary containing your updated parameters\n\t\"\"\"\n\n\tL = len(parameters) // 2  # number of layers in the neural networks\n\tv_corrected = {}  # Initializing first moment estimate, python dictionary\n\ts_corrected = {}  # Initializing second moment estimate, python dictionary\n\n\t# Perform Adam update on all parameters\n\tfor l in range(L):\n\t\t# Moving average of the gradients. Inputs: \"v, grads, beta1\". Output: \"v\".\n\t\tv[\"dW\" + str(l + 1)] = beta1 * v[\"dW\" + str(l + 1)] + (1 - beta1) * grads['dW' + str(l + 1)]\n\t\tv[\"db\" + str(l + 1)] = beta1 * v[\"db\" + str(l + 1)] + (1 - beta1) * grads['db' + str(l + 1)]\n\t\t# Compute bias-corrected first moment estimate. Inputs: \"v, beta1, t\". Output: \"v_corrected\".\n\t\tv_corrected[\"dW\" + str(l + 1)] = v[\"dW\" + str(l + 1)] / (1 - np.power(beta1, t))\n\t\tv_corrected[\"db\" + str(l + 1)] = v[\"db\" + str(l + 1)] / (1 - np.power(beta1, t))\n\t\t# Moving average of the squared gradients. Inputs: \"s, grads, beta2\". Output: \"s\".\n\t\ts[\"dW\" + str(l + 1)] = beta2 * s[\"dW\" + str(l + 1)] + (1 - beta2) * np.power(grads['dW' + str(l + 1)], 2)\n\t\ts[\"db\" + str(l + 1)] = beta2 * s[\"db\" + str(l + 1)] + (1 - beta2) * np.power(grads['db' + str(l + 1)], 2)\n\t\t# Compute bias-corrected second raw moment estimate. Inputs: \"s, beta2, t\". Output: \"s_corrected\".\n\t\ts_corrected[\"dW\" + str(l + 1)] = s[\"dW\" + str(l + 1)] / (1 - np.power(beta2, t))\n\t\ts_corrected[\"db\" + str(l + 1)] = s[\"db\" + str(l + 1)] / (1 - np.power(beta2, t))\n\t\t# Update parameters. Inputs: \"parameters, learning_rate, v_corrected, s_corrected, epsilon\". Output: \"parameters\".\n\t\tparameters[\"W\" + str(l + 1)] = parameters[\"W\" + str(l + 1)] - learning_rate * v_corrected[\"dW\" + str(l + 1)] / np.sqrt(s_corrected[\"dW\" + str(l + 1)] + epsilon)\n\t\tparameters[\"b\" + str(l + 1)] = parameters[\"b\" + str(l + 1)] - learning_rate * v_corrected[\"db\" + str(l + 1)] / np.sqrt(s_corrected[\"db\" + str(l + 1)] + epsilon)\n\n\treturn parameters\n\n\ndef L_layer_model(X, Y, layer_dims, learning_rate, num_iterations, optimizer, beta = 0.9, beta2 = 0.999, mini_batch_size = 64, epsilon = 1e-8):\n\t\"\"\"\n\t:param X:\n\t:param Y:\n\t:param layer_dims:list containing the input size and each layer size\n\t:param learning_rate:\n\t:param num_iterations:\n\t:return:\n\tparameters：final parameters:(W,b)\n\t\"\"\"\n\tcosts = []\n\t# initialize parameters\n\tparameters = initialize_parameters(layer_dims)\n\tif optimizer == \"sgd\":\n\t\tpass  # no initialization required for gradient descent\n\telif optimizer == \"momentum\" or optimizer == \"nesterov_momentum\" or optimizer == \"rmsprop\":\n\t\tv = initialize_velocity(parameters)\n\telif optimizer == \"adagrad\":\n\t\tG = initialize_adagrad(parameters)\n\telif optimizer == \"adadelta\":\n\t\ts, v, delta = initialize_adadelta(parameters)\n\telif optimizer == \"adam\":\n\t\tv, s = initialize_adam(parameters)\n\tt = 0 # initializing the counter required for Adam update\n\tseed = 0\n\tfor i in range(0, num_iterations):\n\t\t# Define the random minibatches. We increment the seed to reshuffle differently the dataset after each epoch\n\t\tseed = seed + 1\n\t\tminibatches = random_mini_batches(X, Y, mini_batch_size, seed)\n\t\tfor minibatch in minibatches:\n\t\t\t# Select a minibatch\n\t\t\t(minibatch_X, minibatch_Y) = minibatch\n\t\t\t# Forward propagation\n\t\t\tAL, caches = forward_propagation(minibatch_X, parameters)\n\t\t\t# Compute cost\n\t\t\tcost = compute_cost(AL, minibatch_Y)\n\t\t\t# Backward propagation\n\t\t\tgrads = backward_propagation(AL, minibatch_Y, caches)\n\t\t\tif optimizer == \"sgd\":\n\t\t\t\tparameters = update_parameters_with_gd(parameters, grads, learning_rate)\n\t\t\telif optimizer == \"momentum\":\n\t\t\t\tparameters = update_parameters_with_momentum(parameters, grads, v, beta, learning_rate)\n\t\t\telif optimizer == \"nesterov_momentum\":\n\t\t\t\tparameters = update_parameters_with_nesterov_momentum(parameters, grads, v, beta, learning_rate)\n\t\t\telif optimizer == \"adagrad\":\n\t\t\t\tparameters = update_parameters_with_adagrad(parameters,grads,G,learning_rate,epsilon)\n\t\t\telif optimizer == \"adadelta\":\n\t\t\t\tparameters = update_parameters_with_adadelta(parameters,grads,beta,s,v,delta,epsilon)\n\t\t\telif optimizer == \"rmsprop\":\n\t\t\t\tparameters = update_parameters_with_rmsprop(parameters, grads, v, beta, learning_rate, epsilon)\n\t\t\telif optimizer == \"adam\":\n\t\t\t\tt += 1\n\t\t\t\tparameters = update_parameters_with_adam(parameters, grads, v, s, t, learning_rate, beta, beta2, epsilon)\n\n\t\tif i % 100 == 0:\n\t\t\tprint(\"Cost after iteration {}: {}\".format(i, cost))\n\t\t\tcosts.append(cost)\n\tprint('length of cost')\n\tprint(len(costs))\n\tplt.clf()\n\tplt.plot(costs, label = optimizer)\n\tplt.xlabel(\"iterations(hundreds)\")  # 横坐标名字\n\tplt.ylabel(\"cost\")  # 纵坐标名字\n\tplt.legend(loc=\"best\")\n\tplt.show()\n\treturn parameters\n\n#predict function\ndef predict(X_test,y_test,parameters):\n\t\"\"\"\n\t:param X:\n\t:param y:\n\t:param parameters:\n\t:return:\n\t\"\"\"\n\tm = y_test.shape[1]\n\tY_prediction = np.zeros((1, m))\n\tprob, caches = forward_propagation(X_test,parameters)\n\tfor i in range(prob.shape[1]):\n\t\t# Convert probabilities A[0,i] to actual predictions p[0,i]\n\t\tif prob[0, i] > 0.5:\n\t\t\tY_prediction[0, i] = 1\n\t\telse:\n\t\t\tY_prediction[0, i] = 0\n\taccuracy = 1- np.mean(np.abs(Y_prediction - y_test))\n\treturn accuracy\n#DNN model\ndef DNN(X_train, y_train, X_test, y_test, layer_dims, learning_rate= 0.0005, num_iterations=10000,optimizer = 'sgd', beta = 0.9, beta2 = 0.999, mini_batch_size = 64,epsilon = 1e-8):\n\tparameters = L_layer_model(X_train, y_train, layer_dims, learning_rate, num_iterations, optimizer, beta, beta2, mini_batch_size, epsilon)\n\taccuracy = predict(X_test,y_test,parameters)\n\treturn accuracy\n\nif __name__ == \"__main__\":\n\tX_data, y_data = load_breast_cancer(return_X_y=True)\n\tX_train, X_test,y_train,y_test = train_test_split(X_data, y_data, train_size=0.8,random_state=28)\n\tX_train = X_train.T\n\ty_train = y_train.reshape(y_train.shape[0], -1).T\n\tX_test = X_test.T\n\ty_test = y_test.reshape(y_test.shape[0], -1).T\n\t# #mini-batch\n\t# accuracy = DNN(X_train, y_train, X_test, y_test, [X_train.shape[0], 10, 5, 1], num_iterations=10000)\n\t# print(accuracy)\n\t# # momentum\n\t# accuracy = DNN(X_train, y_train, X_test, y_test, [X_train.shape[0], 10, 5, 1], num_iterations=10000, optimizer='momentum')\n\t# print(accuracy)\n\t# nesterov momentum\n\t# accuracy = DNN(X_train, y_train, X_test, y_test, [X_train.shape[0], 10, 5, 1], learning_rate= 0.0001,num_iterations=10000,optimizer='nesterov_momentum')\n\t# print(accuracy)\n\t#adagrad\n\t# accuracy = DNN(X_train, y_train, X_test, y_test, [X_train.shape[0], 10, 5, 1], learning_rate= 0.01,num_iterations=10000,optimizer='adagrad')\n\t# print(accuracy)\n\t#adadelta\n\t# accuracy = DNN(X_train, y_train, X_test, y_test, [X_train.shape[0], 10, 5, 1],num_iterations=10000, beta= 0.9, epsilon=1e-6, optimizer='adadelta')\n\t# print(accuracy)\n\t# #RMSprop\n\t# accuracy = DNN(X_train, y_train, X_test, y_test, [X_train.shape[0], 10, 5, 1], learning_rate=0.001, num_iterations=10000, beta=0.9,epsilon=1e-6, optimizer='rmsprop')\n\t# print(accuracy)\n\t#adam\n\taccuracy = DNN(X_train, y_train, X_test, y_test, [X_train.shape[0], 10, 5, 1], learning_rate=0.001, num_iterations=10000, beta=0.9, beta2=0.999, epsilon=1e-8, optimizer='adam')\n\tprint(accuracy)"
  },
  {
    "path": "dinos.txt",
    "content": "Aachenosaurus\nAardonyx\nAbdallahsaurus\nAbelisaurus\nAbrictosaurus\nAbrosaurus\nAbydosaurus\nAcanthopholis\nAchelousaurus\nAcheroraptor\nAchillesaurus\nAchillobator\nAcristavus\nAcrocanthosaurus\nAcrotholus\nActiosaurus\nAdamantisaurus\nAdasaurus\nAdelolophus\nAdeopapposaurus\nAegyptosaurus\nAeolosaurus\nAepisaurus\nAepyornithomimus\nAerosteon\nAetonyxAfromimus\nAfrovenator\nAgathaumas\nAggiosaurus\nAgilisaurus\nAgnosphitys\nAgrosaurus\nAgujaceratops\nAgustinia\nAhshislepelta\nAirakoraptor\nAjancingenia\nAjkaceratops\nAlamosaurus\nAlaskacephale\nAlbalophosaurus\nAlbertaceratops\nAlbertadromeus\nAlbertavenator\nAlbertonykus\nAlbertosaurus\nAlbinykus\nAlbisaurus\nAlcovasaurus\nAlectrosaurus\nAletopelta\nAlgoasaurus\nAlioramus\nAliwalia\nAllosaurus\nAlmas\nAlnashetri\nAlocodon\nAltirhinus\nAltispinax\nAlvarezsaurus\nAlwalkeria\nAlxasaurus\nAmargasaurus\nAmargastegos\nAmargatitanis\nAmazonsaurus\nAmmosaurus\nAmpelosaurus\nAmphicoelias\nAmphicoelicaudia\nAmphisaurus\nAmtocephale\nAmtosaurus\nAmurosaurus\nAmygdalodon\nAnabisetia\nAnasazisaurus\nAnatosaurus\nAnatotitan\nAnchiceratops\nAnchiornis\nAnchisaurus\nAndesaurus\nAndhrasaurus\nAngaturama\nAngloposeidon\nAngolatitan\nAngulomastacator\nAniksosaurus\nAnimantarx\nAnkistrodon\nAnkylosaurus\nAnodontosaurus\nAnoplosaurus\nAnserimimus\nAntarctopelta\nAntarctosaurus\nAntetonitrus\nAnthodon\nAntrodemus\nAnzu\nAoniraptor\nAorun\nApatodon\nApatoraptor\nApatosaurus\nAppalachiosaurus\nAquilops\nAragosaurus\nAralosaurus\nAraucanoraptor\nArchaeoceratops\nArchaeodontosaurus\nArchaeopteryx\nArchaeoraptor\nArchaeornis\nArchaeornithoides\nArchaeornithomimus\nArcovenator\nArctosaurus\nArcusaurus\nArenysaurus\nArgentinosaurus\nArgyrosaurus\nAristosaurus\nAristosuchus\nArizonasaurus\nArkansaurus\nArkharavia\nArrhinoceratops\nArstanosaurus\nAsiaceratops\nAsiamericana\nAsiatosaurus\nAstrodon\nAstrodonius\nAstrodontaurus\nAstrophocaudia\nAsylosaurus\nAtacamatitan\nAtlantosaurus\nAtlasaurus\nAtlascopcosaurus\nAtrociraptor\nAtsinganosaurus\nAublysodon\nAucasaurus\nAugustia\nAugustynolophus\nAuroraceratops\nAurornis\nAustralodocus\nAustralovenator\nAustrocheirus\nAustroposeidon\nAustroraptor\nAustrosaurus\nAvaceratops\nAvalonia\nAvalonianus\nAviatyrannis\nAvimimus\nAvisaurus\nAvipes\nAzendohsaurus\nBactrosaurus\nBagaceratops\nBagaraatan\nBahariasaurus\nBainoceratops\nBakesaurus\nBalaur\nBalochisaurus\nBambiraptor\nBanji\nBaotianmansaurus\nBarapasaurus\nBarilium\nBarosaurus\nBarrosasaurus\nBarsboldia\nBaryonyx\nBashunosaurus\nBasutodon\nBathygnathus\nBatyrosaurus\nBaurutitan\nBayosaurus\nBecklespinax\nBeelemodon\nBeibeilong\nBeipiaognathus\nBeipiaosaurus\nBeishanlong\nBellusaurus\nBelodon\nBerberosaurus\nBetasuchus\nBicentenaria\nBienosaurus\nBihariosaurus\nBilbeyhallorum\nBissektipelta\nBistahieversor\nBlancocerosaurus\nBlasisaurus\nBlikanasaurus\nBolong\nBonapartenykus\nBonapartesaurus\nBonatitan\nBonitasaura\nBorealopelta\nBorealosaurus\nBoreonykus\nBorogovia\nBothriospondylus\nBrachiosaurus\nBrachyceratops\nBrachylophosaurus\nBrachypodosaurus\nBrachyrophus\nBrachytaenius\nBrachytrachelopan\nBradycneme\nBrasileosaurus\nBrasilotitan\nBravoceratops\nBreviceratops\nBrohisaurus\nBrontomerus\nBrontoraptor\nBrontosaurus\nBruhathkayosaurus\nBugenasaura\nBuitreraptor\nBurianosaurus\nBuriolestes\nByranjaffia\nByronosaurus\nCaenagnathasia\nCaenagnathus\nCalamosaurus\nCalamospondylus\nCalamospondylus\nCallovosaurus\nCamarasaurus\nCamarillasaurus\nCamelotia\nCamposaurus\nCamptonotus\nCamptosaurus\nCampylodon\nCampylodoniscus\nCanardia\nCapitalsaurus\nCarcharodontosaurus\nCardiodon\nCarnotaurus\nCaseosaurus\nCathartesaura\nCathetosaurus\nCaudipteryx\nCaudocoelus\nCaulodon\nCedarosaurus\nCedarpelta\nCedrorestes\nCentemodon\nCentrosaurus\nCerasinops\nCeratonykus\nCeratops\nCeratosaurus\nCetiosauriscus\nCetiosaurus\nChangchunsaurus\nChangdusaurus\nChangyuraptor\nChaoyangsaurus\nCharonosaurus\nChasmosaurus\nChassternbergia\nChebsaurus\nChenanisaurus\nCheneosaurus\nChialingosaurus\nChiayusaurus\nChienkosaurus\nChihuahuasaurus\nChilantaisaurus\nChilesaurus\nChindesaurus\nChingkankousaurus\nChinshakiangosaurus\nChirostenotes\nChoconsaurus\nChondrosteosaurus\nChromogisaurus\nChuandongocoelurus\nChuanjiesaurus\nChuanqilong\nChubutisaurus\nChungkingosaurus\nChuxiongosaurus\nCinizasaurus\nCionodon\nCitipati\nCladeiodon\nClaorhynchus\nClaosaurus\nClarencea\nClasmodosaurus\nClepsysaurus\nCoahuilaceratops\nCoelophysis\nCoelosaurus\nCoeluroides\nCoelurosauravus\nCoelurus\nColepiocephale\nColoradia\nColoradisaurus\nColossosaurus\nComahuesaurus\nComanchesaurus\nCompsognathus\nCompsosuchus\nConcavenator\nConchoraptor\nCondorraptor\nCoronosaurus\nCorythoraptor\nCorythosaurus\nCraspedodon\nCrataeomus\nCraterosaurus\nCreosaurus\nCrichtonpelta\nCrichtonsaurus\nCristatusaurus\nCrosbysaurus\nCruxicheiros\nCryolophosaurus\nCryptodraco\nCryptoraptor\nCryptosaurus\nCryptovolans\nCumnoria\nDaanosaurus\nDacentrurus\nDachongosaurus\nDaemonosaurus\nDahalokely\nDakosaurus\nDakotadon\nDakotaraptor\nDaliansaurus\nDamalasaurus\nDandakosaurus\nDanubiosaurus\nDaptosaurus\nDarwinsaurus\nDashanpusaurus\nDaspletosaurus\nDasygnathoides\nDasygnathus\nDatanglong\nDatonglong\nDatousaurus\nDaurosaurus\nDaxiatitan\nDeinocheirus\nDeinodon\nDeinonychus\nDelapparentia\nDeltadromeus\nDemandasaurus\nDenversaurus\nDeuterosaurus\nDiabloceratops\nDiamantinasaurus\nDianchungosaurus\nDiceratops\nDiceratusDiclonius\nDicraeosaurus\nDidanodonDilong\nDilophosaurus\nDiluvicursor\nDimodosaurus\nDinheirosaurus\nDinodocus\nDinotyrannus\nDiplodocus\nDiplotomodon\nDiracodon\nDolichosuchus\nDollodon\nDomeykosaurus\nDongbeititan\nDongyangopelta\nDongyangosaurus\nDoratodon\nDoryphorosaurus\nDraconyx\nDracopelta\nDracoraptor\nDracorex\nDracovenator\nDravidosaurus\nDreadnoughtus\nDrinker\nDromaeosauroides\nDromaeosaurus\nDromiceiomimus\nDromicosaurus\nDrusilasaura\nDryosaurus\nDryptosauroides\nDryptosaurus\nDubreuillosaurus\nDuriatitan\nDuriavenator\nDynamosaurus\nDyoplosaurus\nDysalotosaurus\nDysganus\nDyslocosaurus\nDystrophaeus\nDystylosaurus\nEchinodon\nEdmarka\nEdmontonia\nEdmontosaurus\nEfraasia\nEiniosaurus\nEkrixinatosaurus\nElachistosuchus\nElaltitan\nElaphrosaurus\nElmisaurus\nElopteryx\nElosaurus\nElrhazosaurus\nElvisaurus\nEmausaurus\nEmbasaurus\nEnigmosaurus\nEoabelisaurus\nEobrontosaurus\nEocarcharia\nEoceratops\nEocursor\nEodromaeus\nEohadrosaurus\nEolambia\nEomamenchisaurus\nEoplophysis\nEoraptor\nEosinopteryx\nEotrachodon\nEotriceratops\nEotyrannus\nEousdryosaurus\nEpachthosaurus\nEpanterias\nEphoenosaurus\nEpicampodon\nEpichirostenotes\nEpidendrosaurus\nEpidexipteryx\nEquijubus\nErectopus\nErketu\nErliansaurus\nErlikosaurus\nEshanosaurus\nEuacanthus\nEucamerotus\nEucentrosaurus\nEucercosaurus\nEucnemesaurus\nEucoelophysis\nEugongbusaurus\nEuhelopus\nEuoplocephalus\nEupodosaurus\nEureodon\nEurolimnornis\nEuronychodon\nEuropasaurus\nEuropatitan\nEuropelta\nEuskelosaurus\nEustreptospondylus\nFabrosaurus\nFalcarius\nFendusaurus\nFenestrosaurus\nFerganasaurus\nFerganastegos\nFerganocephale\nForaminacephale\nFosterovenator\nFrenguellisaurus\nFruitadens\nFukuiraptor\nFukuisaurus\nFukuititan\nFukuivenator\nFulengia\nFulgurotherium\nFusinasus\nFusuisaurus\nFutabasaurus\nFutalognkosaurus\nGadolosaurus\nGaleamopus\nGalesaurus\nGallimimus\nGaltonia\nGalveosaurus\nGalvesaurus\nGannansaurus\nGansutitan\nGanzhousaurus\nGargoyleosaurus\nGarudimimus\nGasosaurus\nGasparinisaura\nGastonia\nGavinosaurus\nGeminiraptor\nGenusaurus\nGenyodectes\nGeranosaurus\nGideonmantellia\nGiganotosaurus\nGigantoraptor\nGigantosaurus\nGigantosaurus\nGigantoscelus\nGigantspinosaurus\nGilmoreosaurus\nGinnareemimus\nGiraffatitan\nGlacialisaurus\nGlishades\nGlyptodontopelta\nSkeleton\nGobiceratops\nGobisaurus\nGobititan\nGobivenator\nGodzillasaurus\nGojirasaurus\nGondwanatitan\nGongbusaurus\nGongpoquansaurus\nGongxianosaurus\nGorgosaurus\nGoyocephale\nGraciliceratops\nGraciliraptor\nGracilisuchus\nGravitholus\nGresslyosaurus\nGriphornis\nGriphosaurus\nGryphoceratops\nGryponyx\nGryposaurus\nGspsaurus\nGuaibasaurus\nGualicho\nGuanlong\nGwyneddosaurus\nGyposaurus\nHadrosauravus\nHadrosaurus\nHaestasaurus\nHagryphus\nHallopus\nHalszkaraptor\nHalticosaurus\nHanssuesia\nHanwulosaurus\nHaplocanthosaurus\nHaplocanthus\nHaplocheirus\nHarpymimus\nHaya\nHecatasaurus\nHeilongjiangosaurus\nHeishansaurus\nHelioceratops\nHelopus\nHeptasteornis\nHerbstosaurus\nHerrerasaurus\nHesperonychus\nHesperosaurus\nHeterodontosaurus\nHeterosaurus\nHexing\nHexinlusaurus\nHeyuannia\nHierosaurus\nHippodraco\nHironosaurus\nHisanohamasaurus\nHistriasaurus\nHomalocephale\nHonghesaurus\nHongshanosaurus\nHoplitosaurus\nHoplosaurus\nHorshamosaurus\nHortalotarsus\nHuabeisaurus\nHualianceratops\nHuanansaurus\nHuanghetitan\nHuangshanlong\nHuaxiagnathus\nHuaxiaosaurus\nHuaxiasaurus\nHuayangosaurus\nHudiesaurus\nHuehuecanauhtlus\nHulsanpes\nHungarosaurus\nHuxleysaurus\nHylaeosaurus\nHylosaurusHypacrosaurus\nHypselorhachis\nHypselosaurus\nHypselospinus\nHypsibema\nHypsilophodon\nHypsirhophus\nhabodcraniosaurus\nIchthyovenator\nIgnavusaurus\nIguanacolossus\nIguanodon\nIguanoides\nSkeleton\nIguanosaurus\nIliosuchus\nIlokelesia\nIncisivosaurus\nIndosaurus\nIndosuchus\nIngenia\nInosaurus\nIrritator\nIsaberrysaura\nIsanosaurus\nIschioceratops\nIschisaurus\nIschyrosaurus\nIsisaurus\nIssasaurus\nItemirus\nIuticosaurus\nJainosaurus\nJaklapallisaurus\nJanenschia\nJaxartosaurus\nJeholosaurus\nJenghizkhan\nJensenosaurus\nJeyawati\nJianchangosaurus\nJiangjunmiaosaurus\nJiangjunosaurus\nJiangshanosaurus\nJiangxisaurus\nJianianhualong\nJinfengopteryx\nJingshanosaurus\nJintasaurus\nJinzhousaurus\nJiutaisaurus\nJobaria\nJubbulpuria\nJudiceratops\nJurapteryx\nJurassosaurus\nJuratyrant\nJuravenator\nKagasaurus\nKaijiangosaurus\nKakuru\nKangnasaurus\nKarongasaurus\nKatepensaurus\nKatsuyamasaurus\nKayentavenator\nKazaklambia\nKelmayisaurus\nKemkemiaKentrosaurus\nKentrurosaurus\nKerberosaurus\nKentrosaurus\nKhaan\nKhetranisaurus\nKileskus\nKinnareemimus\nKitadanisaurus\nKittysaurus\nKlamelisaurusKol\nKoparion\nKoreaceratops\nKoreanosaurus\nKoreanosaurus\nKoshisaurus\nKosmoceratops\nKotasaurus\nKoutalisaurus\nKritosaurus\nKryptops\nKrzyzanowskisaurus\nKukufeldia\nKulceratops\nKulindadromeus\nKulindapteryx\nKunbarrasaurus\nKundurosaurus\nKunmingosaurus\nKuszholia\nLabocania\nLabrosaurus\nLaelaps\nLaevisuchus\nLagerpeton\nLagosuchus\nLaiyangosaurus\nLamaceratops\nLambeosaurus\nLametasaurus\nLamplughsaura\nLanasaurus\nLancangosaurus\nLancanjiangosaurus\nLanzhousaurus\nLaosaurus\nLapampasaurus\nLaplatasaurus\nLapparentosaurus\nLaquintasaura\nLatenivenatrix\nLatirhinus\nLeaellynasaura\nLeinkupal\nLeipsanosaurus\nLengosaurus\nLeonerasaurus\nLepidocheirosaurus\nLepidus\nLeptoceratops\nLeptorhynchos\nLeptospondylus\nLeshansaurus\nLesothosaurus\nLessemsaurus\nLevnesovia\nLewisuchus\nLexovisaurus\nLeyesaurus\nLiaoceratops\nLiaoningosaurus\nLiaoningtitan\nLiaoningvenator\nLiassaurus\nLibycosaurus\nLigabueino\nLigabuesaurus\nLigomasaurus\nLikhoelesaurus\nLiliensternus\nLimaysaurus\nLimnornis\nLimnosaurus\nLimusaurus\nLinhenykus\nLinheraptor\nLinhevenator\nLirainosaurus\nLisboasaurusLiubangosaurus\nLohuecotitan\nLoncosaurus\nLongisquama\nLongosaurus\nLophorhothon\nLophostropheus\nLoricatosaurus\nLoricosaurus\nLosillasaurus\nLourinhanosaurus\nLourinhasaurus\nLuanchuanraptor\nLuanpingosaurus\nLucianosaurus\nLucianovenator\nLufengosaurus\nLukousaurus\nLuoyanggia\nLurdusaurus\nLusitanosaurus\nLusotitan\nLycorhinus\nLythronax\nMacelognathus\nMachairasaurus\nMachairoceratops\nMacrodontophion\nMacrogryphosaurus\nMacrophalangia\nMacroscelosaurus\nMacrurosaurus\nMadsenius\nMagnapaulia\nMagnamanus\nMagnirostris\nMagnosaurus\nMagulodon\nMagyarosaurus\nMahakala\nMaiasaura\nMajungasaurus\nMajungatholus\nMalarguesaurus\nMalawisaurus\nMaleevosaurus\nMaleevus\nMamenchisaurus\nManidens\nMandschurosaurus\nManospondylus\nMantellisaurus\nMantellodon\nMapusaurus\nMarasuchus\nMarisaurus\nMarmarospondylus\nMarshosaurus\nMartharaptor\nMasiakasaurus\nMassospondylus\nMatheronodon\nMaxakalisaurus\nMedusaceratops\nMegacervixosaurus\nMegadactylus\nMegadontosaurus\nMegalosaurus\nMegapnosaurus\nMegaraptor\nMei\nMelanorosaurus\nMendozasaurus\nMercuriceratops\nMeroktenos\nMetriacanthosaurus\nMicrocephale\nMicroceratops\nMicroceratus\nMicrocoelus\nMicrodontosaurus\nMicrohadrosaurus\nMicropachycephalosaurus\nMicroraptor\nMicrovenator\nMierasaurus\nMifunesaurus\nMinmi\nMinotaurasaurus\nMiragaia\nMirischia\nMoabosaurus\nMochlodon\nMohammadisaurus\nMojoceratops\nMongolosaurus\nMonkonosaurus\nMonoclonius\nMonolophosaurus\nMononychus\nMononykus\nMontanoceratops\nMorelladon\nMorinosaurus\nMorosaurus\nMorrosaurus\nMosaiceratops\nMoshisaurus\nMtapaiasaurus\nMtotosaurus\nMurusraptor\nMussaurus\nMuttaburrasaurus\nMuyelensaurus\nMymoorapelta\nNaashoibitosaurus\nNambalia\nNankangia\nNanningosaurus\nNanosaurus\nNanotyrannus\nNanshiungosaurus\nNanuqsaurus\nNanyangosaurus\nNarambuenatitan\nNasutoceratops\nNatronasaurus\nNebulasaurus\nNectosaurus\nNedcolbertia\nNedoceratops\nNeimongosaurus\nNemegtia\nNemegtomaia\nNemegtosaurus\nNeosaurus\nNeosodon\nNeovenator\nNeuquenraptor\nNeuquensaurus\nNewtonsaurus\nNgexisaurus\nNicksaurus\nNigersaurus\nNingyuansaurus\nNiobrarasaurus\nNipponosaurus\nNoasaurus\nNodocephalosaurus\nNodosaurus\nNomingia\nNopcsaspondylus\nNormanniasaurus\nNothronychus\nNotoceratops\nNotocolossus\nNotohypsilophodon\nNqwebasaurus\nNteregosaurus\nNurosaurus\nNuthetes\nNyasasaurus\nNyororosaurus\nOhmdenosaurus\nOjoceratops\nOjoraptorsaurus\nOligosaurus\nOlorotitan\nOmeisaurus\nOmosaurus\nOnychosaurus\nOohkotokia\nOpisthocoelicaudia\nOplosaurus\nOrcomimus\nOrinosaurusOrkoraptor\nOrnatotholusOrnithodesmus\nOrnithoides\nOrnitholestes\nOrnithomerus\nOrnithomimoides\nOrnithomimus\nOrnithopsis\nOrnithosuchus\nOrnithotarsus\nOrodromeus\nOrosaurus\nOrthogoniosaurus\nOrthomerus\nOryctodromeus\nOshanosaurus\nOsmakasaurus\nOstafrikasaurus\nOstromia\nOthnielia\nOthnielosaurus\nOtogosaurus\nOuranosaurus\nOverosaurus\nOviraptor\nOvoraptor\nOwenodon\nOxalaia\nOzraptor\nPachycephalosaurus\nPachyrhinosaurus\nPachysauriscus\nPachysaurops\nPachysaurus\nPachyspondylus\nPachysuchus\nPadillasaurus\nPakisaurus\nPalaeoctonus\nPalaeocursornis\nPalaeolimnornis\nPalaeopteryx\nPalaeosauriscus\nPalaeosaurus\nPalaeosaurus\nPalaeoscincus\nPaleosaurus\nPaludititan\nPaluxysaurus\nPampadromaeus\nPamparaptor\nPanamericansaurus\nPandoravenator\nPanguraptor\nPanoplosaurus\nPanphagia\nPantydraco\nParaiguanodon\nParalititan\nParanthodon\nPararhabdodon\nParasaurolophus\nPareiasaurus\nParksosaurus\nParonychodon\nParrosaurus\nParvicursor\nPatagonykus\nPatagosaurus\nPatagotitan\nPawpawsaurus\nPectinodon\nPedopenna\nPegomastax\nPeishansaurus\nPekinosaurus\nPelecanimimus\nPellegrinisaurus\nPeloroplites\nPelorosaurus\nPeltosaurus\nPenelopognathus\nPentaceratops\nPetrobrasaurus\nPhaedrolosaurus\nPhilovenator\nPhuwiangosaurus\nPhyllodon\nPiatnitzkysaurus\nPicrodon\nPinacosaurus\nPisanosaurus\nPitekunsaurus\nPiveteausaurus\nPlanicoxa\nPlateosauravus\nPlateosaurus\nPlatyceratops\nPlesiohadros\nPleurocoelus\nPleuropeltus\nPneumatoarthrus\nPneumatoraptor\nPodokesaurus\nPoekilopleuron\nPolacanthoides\nPolacanthus\nPolyodontosaurus\nPolyonax\nPonerosteus\nPoposaurus\nParasaurolophus\nPostosuchus\nPowellvenator\nPradhania\nPrenocephale\nPrenoceratops\nPriconodon\nPriodontognathus\nProa\nProbactrosaurus\nProbrachylophosaurus\nProceratops\nProceratosaurus\nProcerosaurus\nProcerosaurus\nProcheneosaurus\nProcompsognathus\nProdeinodon\nProiguanodon\nPropanoplosaurus\nProplanicoxa\nProsaurolophus\nProtarchaeopteryx\nProtecovasaurus\nProtiguanodon\nProtoavis\nProtoceratops\nProtognathosaurus\nProtognathus\nProtohadros\nProtorosaurus\nProtorosaurus\nProtrachodon\nProyandusaurus\nPseudolagosuchus\nPsittacosaurus\nPteropelyx\nPterospondylus\nPuertasaurus\nPukyongosaurus\nPulanesaura\nPycnonemosaurus\nPyroraptor\nQantassaurus\nQianzhousaurus\nQiaowanlong\nQijianglong\nQinlingosaurus\nQingxiusaurus\nQiupalong\nQuaesitosaurus\nQuetecsaurus\nQuilmesaurus\nRachitrema\nRahiolisaurus\nRahona\nRahonavis\nRajasaurus\nRapator\nRapetosaurus\nRaptorex\nRatchasimasaurus\nRativates\nRayososaurus\nRazanandrongobe\nRebbachisaurus\nRegaliceratops\nRegnosaurus\nRevueltosaurus\nRhabdodon\nRhadinosaurus\nRhinorex\nRhodanosaurus\nRhoetosaurus\nRhopalodon\nRiabininohadros\nRichardoestesia\nRileya\nRileyasuchus\nRinchenia\nRinconsaurus\nRioarribasaurus\nRiodevasaurus\nRiojasaurus\nRiojasuchus\nRocasaurus\nRoccosaurus\nRubeosaurus\nRuehleia\nRugocaudia\nRugops\nRukwatitan\nRuyangosaurus\nSacisaurus\nSahaliyania\nSaichania\nSaldamosaurus\nSalimosaurus\nSaltasaurus\nSaltopus\nSaltriosaurus\nSanchusaurus\nSangonghesaurus\nSanjuansaurus\nSanpasaurus\nSantanaraptor\nSaraikimasoom\nSarahsaurus\nSarcolestes\nSarcosaurus\nSarmientosaurus\nSaturnalia\nSauraechinodon\nSaurolophus\nSauroniops\nSauropelta\nSaurophaganax\nSaurophagus\nSauroplites\nSauroposeidon\nSaurornithoides\nSaurornitholestes\nSavannasaurus\nScansoriopteryx\nScaphonyx\nScelidosaurus\nScipionyx\nSciurumimus\nScleromochlus\nScolosaurus\nScutellosaurus\nSecernosaurus\nSefapanosaurus\nSegisaurus\nSegnosaurus\nSeismosaurus\nSeitaad\nSelimanosaurus\nSellacoxa\nSellosaurus\nSerendipaceratops\nSerikornis\nShamosaurus\nShanag\nShanshanosaurus\nShantungosaurus\nShanxia\nShanyangosaurus\nShaochilong\nShenzhousaurus\nShidaisaurus\nShingopana\nShixinggia\nShuangbaisaurus\nShuangmiaosaurus\nShunosaurus\nShuvosaurus\nShuvuuia\nSiamodon\nSiamodracon\nSiamosaurus\nSiamotyrannus\nSiats\nSibirosaurus\nSibirotitan\nSidormimus\nSigilmassasaurus\nSilesaurus\nSiluosaurus\nSilvisaurus\nSimilicaudipteryx\nSinocalliopteryx\nSinoceratops\nSinocoelurus\nSinopelta\nSinopeltosaurus\nSinornithoides\nSinornithomimus\nSinornithosaurus\nSinosauropteryx\nSinosaurus\nSinotyrannus\nSinovenator\nSinraptor\nSinusonasus\nSirindhorna\nSkorpiovenator\nSmilodon\nSonidosaurus\nSonorasaurus\nSoriatitan\nSphaerotholus\nSphenosaurus\nSphenospondylus\nSpiclypeus\nSpinophorosaurus\nSpinops\nSpinosaurus\nSpinostropheus\nSpinosuchus\nSpondylosoma\nSqualodon\nStaurikosaurus\nStegoceras\nStegopelta\nStegosaurides\nStegosaurus\nStenonychosaurus\nStenopelix\nStenotholus\nStephanosaurus\nStereocephalus\nSterrholophus\nStokesosaurus\nStormbergia\nStrenusaurus\nStreptospondylus\nStruthiomimus\nStruthiosaurus\nStygimoloch\nStygivenator\nStyracosaurus\nSuccinodon\nSuchomimus\nSuchosaurus\nSuchoprion\nSugiyamasaurus\nSkeleton\nSulaimanisaurus\nSupersaurus\nSuuwassea\nSuzhousaurus\nSymphyrophus\nSyngonosaurus\nSyntarsus\nSyrmosaurus\nSzechuanosaurus\nTachiraptor\nTalarurus\nTalenkauen\nTalos\nTambatitanis\nTangvayosaurus\nTanius\nTanycolagreus\nTanystropheus\nTanystrosuchus\nTaohelong\nTapinocephalus\nTapuiasaurus\nTarascosaurus\nTarbosaurus\nTarchia\nTastavinsaurus\nTatankacephalus\nTatankaceratops\nTataouinea\nTatisaurus\nTaurovenator\nTaveirosaurus\nTawa\nTawasaurus\nTazoudasaurus\nTechnosaurus\nTecovasaurus\nTehuelchesaurus\nTeihivenator\nTeinurosaurus\nTeleocrater\nTelmatosaurus\nTenantosaurus\nTenchisaurus\nTendaguria\nTengrisaurus\nTenontosaurus\nTeratophoneus\nTeratosaurus\nTermatosaurus\nTethyshadros\nTetragonosaurus\nTexacephale\nTexasetes\nTeyuwasu\nThecocoelurus\nThecodontosaurus\nThecospondylus\nTheiophytalia\nTherizinosaurus\nTherosaurus\nThescelosaurus\nThespesius\nThotobolosaurus\nTianchisaurus\nTianchungosaurus\nTianyulong\nTianyuraptor\nTianzhenosaurus\nTichosteus\nTienshanosaurus\nTimimus\nTimurlengia\nTitanoceratops\nTitanosaurus\nTitanosaurus\nTochisaurus\nTomodon\nTonganosaurus\nTongtianlong\nTonouchisaurus\nTorilion\nTornieria\nTorosaurus\nTorvosaurus\nTototlmimus\nTrachodon\nTraukutitan\nTrialestes\nTriassolestes\nTribelesodon\nTriceratops\nTrigonosaurus\nTrimucrodon\nTrinisaura\nTriunfosaurus\nTroodon\nTsaagan\nTsagantegia\nTsintaosaurus\nTugulusaurus\nTuojiangosaurus\nTuranoceratops\nTuriasaurus\nTylocephale\nTylosteus\nTyrannosaurus\nTyrannotitan\nIllustration\nUberabatitan\nUdanoceratops\nUgrosaurus\nUgrunaaluk\nUintasaurus\nUltrasauros\nUltrasaurus\nUltrasaurus\nUmarsaurus\nUnaysaurus\nUnenlagia\nUnescoceratops\nUnicerosaurus\nUnquillosaurus\nUrbacodon\nUtahceratops\nUtahraptor\nUteodon\nVagaceratops\nVahiny\nValdoraptor\nValdosaurus\nVariraptor\nVelociraptor\nVectensia\nVectisaurus\nVelafrons\nVelocipes\nVelociraptor\nVelocisaurus\nVenaticosuchus\nVenenosaurus\nVeterupristisaurus\nViavenator\nVitakridrinda\nVitakrisaurus\nVolkheimeria\nVouivria\nVulcanodon\nWadhurstia\nWakinosaurus\nWalgettosuchus\nWalkeria\nWalkersaurus\nWangonisaurus\nWannanosaurus\nWellnhoferia\nWendiceratops\nWiehenvenator\nWillinakaqe\nWintonotitan\nWuerhosaurus\nWulagasaurus\nWulatelong\nWyleyia\nWyomingraptor\nXenoceratops\nXenoposeidon\nXenotarsosaurus\nXianshanosaurus\nXiaosaurus\nXingxiulong\nXinjiangovenator\nXinjiangtitan\nXiongguanlong\nXixianykus\nXixiasaurus\nXixiposaurus\nXuanhanosaurus\nXuanhuaceratops\nXuanhuasaurus\nXuwulong\nYaleosaurus\nYamaceratops\nYandusaurus\nYangchuanosaurus\nYaverlandia\nYehuecauhceratops\nYezosaurus\nYibinosaurus\nYimenosaurus\nYingshanosaurus\nYinlong\nYixianosaurus\nYizhousaurus\nYongjinglong\nYuanmouraptor\nYuanmousaurus\nYueosaurus\nYulong\nYunganglong\nYunmenglong\nYunnanosaurus\nYunxianosaurus\nYurgovuchia\nYutyrannus\nZanabazar\nZanclodon\nZapalasaurus\nZapsalis\nZaraapelta\nZatomusZby\nZephyrosaurus\nZhanghenglong\nZhejiangosaurus\nZhenyuanlong\nZhongornis\nZhongjianosaurus\nZhongyuansaurus\nZhuchengceratops\nZhuchengosaurus\nZhuchengtitan\nZhuchengtyrannus\nZiapelta\nZigongosaurus\nZizhongosaurus\nZuniceratops\nZunityrannus\nZuolong\nZuoyunlong\nZupaysaurus\nZuul"
  },
  {
    "path": "gradient_checking.py",
    "content": "import numpy as np\nfrom sklearn.datasets import  load_breast_cancer\nfrom sklearn.model_selection import train_test_split\n\n#initialize parameters(w,b)\ndef initialize_parameters(layer_dims):\n\t\"\"\"\n\t:param layer_dims: list,每一层单元的个数（维度）\n\t:return:dictionary,存储参数w1,w2,...,wL,b1,...,bL\n\t\"\"\"\n\tnp.random.seed(1)\n\tL = len(layer_dims)#the number of layers in the network\n\tparameters = {}\n\tfor l in range(1,L):\n\t\t# parameters[\"W\" + str(l)] = np.random.randn(layer_dims[l],layer_dims[l-1])*0.01\n\t\t# parameters[\"W\" + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1])*np.sqrt(2/layer_dims[l-1]) # he initialization\n\t\t# parameters[\"W\" + str(l)] = np.zeros((layer_dims[l], layer_dims[l - 1])) #为了测试初始化为0的后果\n\t\tparameters[\"W\" + str(l)] = np.random.randn(layer_dims[l], layer_dims[l - 1]) * np.sqrt(1 / layer_dims[l - 1])  # xavier initialization\n\t\tparameters[\"b\" + str(l)] = np.zeros((layer_dims[l],1))\n\treturn parameters\n\ndef relu(Z):\n\t\"\"\"\n\t:param Z: Output of the linear layer\n\t:return:\n\tA: output of activation\n\t\"\"\"\n\tA = np.maximum(0,Z)\n\treturn A\n\n#implement the activation function(ReLU and sigmoid)\ndef sigmoid(Z):\n\t\"\"\"\n\t:param Z: Output of the linear layer\n\t:return:\n\t\"\"\"\n\tA = 1 / (1 + np.exp(-Z))\n\treturn A\n\ndef forward_propagation(X, parameters):\n\t\"\"\"\n\tX -- input dataset, of shape (input size, number of examples)\n    parameters -- python dictionary containing your parameters \"W1\", \"b1\", \"W2\", \"b2\",...,\"WL\", \"bL\"\n                    W -- weight matrix of shape (size of current layer, size of previous layer)\n                    b -- bias vector of shape (size of current layer,1)\n    :return:\n\tAL: the output of the last Layer(y_predict)\n\tcaches: list, every element is a tuple:(W,b,z,A_pre)\n\t\"\"\"\n\tL = len(parameters) // 2  # number of layer\n\tA = X\n\tcaches = [(None,None,None,X)]  # 第0层(None,None,None,A0) w,b,z用none填充,下标与层数一致，用于存储每一层的，w,b,z,A\n\t# calculate from 1 to L-1 layer\n\tfor l in range(1,L):\n\t\tA_pre = A\n\t\tW = parameters[\"W\" + str(l)]\n\t\tb = parameters[\"b\" + str(l)]\n\t\tz = np.dot(W,A_pre) + b #计算z = wx + b\n\t\tA = relu(z) #relu activation function\n\t\tcaches.append((W,b,z,A))\n\t# calculate Lth layer\n\tWL = parameters[\"W\" + str(L)]\n\tbL = parameters[\"b\" + str(L)]\n\tzL = np.dot(WL,A) + bL\n\tAL = sigmoid(zL)\n\tcaches.append((WL,bL,zL,AL))\n\treturn AL, caches\n\n#calculate cost function\ndef compute_cost(AL,Y):\n\t\"\"\"\n\t:param AL: 最后一层的激活值，即预测值，shape:(1,number of examples)\n\t:param Y:真实值,shape:(1, number of examples)\n\t:return:\n\t\"\"\"\n\tm = Y.shape[1]\n\tcost = 1. / m * np.nansum(np.multiply(-np.log(AL), Y) + np.multiply(-np.log(1 - AL), 1 - Y))\n\t#从数组的形状中删除单维条目，即把shape中为1的维度去掉，比如把[[[2]]]变成2\n\tcost = np.squeeze(cost)\n\treturn cost\n\n\n# derivation of relu\ndef relu_backward(Z):\n\t\"\"\"\n\t:param Z: the input of activation\n\t:return:\n\t\"\"\"\n\tdA = np.int64(Z > 0)\n\treturn dA\n\ndef backward_propagation(AL, Y, caches):\n\t\"\"\"\n\tImplement the backward propagation presented in figure 2.\n\tArguments:\n\tX -- input dataset, of shape (input size, number of examples)\n\tY -- true \"label\" vector (containing 0 if cat, 1 if non-cat)\n\tcaches -- caches output from forward_propagation(),(W,b,z,A)\n\n\tReturns:\n\tgradients -- A dictionary with the gradients with respect to dW,db\n\t\"\"\"\n\tm = Y.shape[1]\n\tL = len(caches) - 1\n\t# print(\"L:   \" + str(L))\n\t#calculate the Lth layer gradients\n\tprev_AL = caches[L-1][3]\n\tdzL = 1./m * (AL - Y)\n\t# print(dzL.shape)\n\t# print(prev_AL.T.shape)\n\tdWL = np.dot(dzL, prev_AL.T)\n\tdbL = np.sum(dzL, axis=1, keepdims=True)\n\tgradients = {\"dW\"+str(L):dWL, \"db\"+str(L):dbL}\n\t#calculate from L-1 to 1 layer gradients\n\tfor l in reversed(range(1,L)): # L-1,L-3,....,1\n\t\tpost_W= caches[l+1][0] #要用后一层的W\n\t\tdz = dzL #用后一层的dz\n\n\t\tdal = np.dot(post_W.T, dz)\n\t\tz = caches[l][2]#当前层的z\n\t\tdzl = np.multiply(dal, relu_backward(z))#可以直接用dzl = np.multiply(dal, np.int64(Al > 0))来实现\n\t\tprev_A = caches[l-1][3]#前一层的A\n\t\tdWl = np.dot(dzl, prev_A.T)\n\t\tdbl = np.sum(dzl, axis=1, keepdims=True)\n\n\t\tgradients[\"dW\" + str(l)] = dWl\n\t\tgradients[\"db\" + str(l)] = dbl\n\t\tdzL = dzl #更新dz\n\treturn gradients\n\n#convert parameter into vector\ndef dictionary_to_vector(parameters):\n\t\"\"\"\n\tRoll all our parameters dictionary into a single vector satisfying our specific required shape.\n\t\"\"\"\n\tcount = 0\n\tfor key in parameters:\n\t\t# flatten parameter\n\t\tnew_vector = np.reshape(parameters[key], (-1, 1))#convert matrix into vector\n\t\tif count == 0:#刚开始时新建一个向量\n\t\t\ttheta = new_vector\n\t\telse:\n\t\t\ttheta = np.concatenate((theta, new_vector), axis=0)#和已有的向量合并成新向量\n\t\tcount = count + 1\n\n\treturn theta\n\n#convert gradients into vector\ndef gradients_to_vector(gradients):\n\t\"\"\"\n\tRoll all our parameters dictionary into a single vector satisfying our specific required shape.\n\t\"\"\"\n\t# 因为gradient的存储顺序是{dWL,dbL,....dW2,db2,dW1,db1}，为了统一采用[dW1,db1,...dWL,dbL]方面后面求欧式距离（对应元素）\n\tL = len(gradients) // 2\n\tkeys = []\n\tfor l in range(L):\n\t\tkeys.append(\"dW\" + str(l + 1))\n\t\tkeys.append(\"db\" + str(l + 1))\n\tcount = 0\n\tfor key in keys:\n\t\t# flatten parameter\n\t\tnew_vector = np.reshape(gradients[key], (-1, 1))#convert matrix into vector\n\t\tif count == 0:#刚开始时新建一个向量\n\t\t\ttheta = new_vector\n\t\telse:\n\t\t\ttheta = np.concatenate((theta, new_vector), axis=0)#和已有的向量合并成新向量\n\t\tcount = count + 1\n\n\treturn theta\n\n#convert vector into dictionary\ndef vector_to_dictionary(theta, layer_dims):\n\t\"\"\"\n    Unroll all our parameters dictionary from a single vector satisfying our specific required shape.\n    \"\"\"\n\tparameters = {}\n\tL = len(layer_dims)  # the number of layers in the network\n\tstart = 0\n\tend = 0\n\tfor l in range(1, L):\n\t\tend += layer_dims[l]*layer_dims[l-1]\n\t\tparameters[\"W\" + str(l)] = theta[start:end].reshape((layer_dims[l],layer_dims[l-1]))\n\t\tstart = end\n\t\tend += layer_dims[l]*1\n\t\tparameters[\"b\" + str(l)] = theta[start:end].reshape((layer_dims[l],1))\n\t\tstart = end\n\treturn parameters\n\n\ndef gradient_check(parameters, gradients, X, Y, layer_dims, epsilon=1e-7):\n\t\"\"\"\n\tChecks if backward_propagation_n computes correctly the gradient of the cost output by forward_propagation_n\n\n\tArguments:\n\tparameters -- python dictionary containing your parameters \"W1\", \"b1\", \"W2\", \"b2\", \"W3\", \"b3\":\n\tgrad -- output of backward_propagation_n, contains gradients of the cost with respect to the parameters.\n\tx -- input datapoint, of shape (input size, 1)\n\ty -- true \"label\"\n\tepsilon -- tiny shift to the input to compute approximated gradient with formula(1)\n\tlayer_dims -- the layer dimension of nn\n\tReturns:\n\tdifference -- difference (2) between the approximated gradient and the backward propagation gradient\n\t\"\"\"\n\n\tparameters_vector = dictionary_to_vector(parameters)  # parameters_values\n\tgrad = gradients_to_vector(gradients)\n\tnum_parameters = parameters_vector.shape[0]\n\tJ_plus = np.zeros((num_parameters, 1))\n\tJ_minus = np.zeros((num_parameters, 1))\n\tgradapprox = np.zeros((num_parameters, 1))\n\n\t# Compute gradapprox\n\tfor i in range(num_parameters):\n\t\tthetaplus = np.copy(parameters_vector)\n\t\tthetaplus[i] = thetaplus[i] + epsilon\n\t\tAL, _ = forward_propagation(X, vector_to_dictionary(thetaplus,layer_dims))\n\t\tJ_plus[i] = compute_cost(AL,Y)\n\n\t\tthetaminus = np.copy(parameters_vector)\n\t\tthetaminus[i] = thetaminus[i] - epsilon\n\t\tAL, _ = forward_propagation(X, vector_to_dictionary(thetaminus, layer_dims))\n\t\tJ_minus[i] = compute_cost(AL,Y)\n\t\tgradapprox[i] = (J_plus[i] - J_minus[i]) / (2 * epsilon)\n\n\tnumerator = np.linalg.norm(grad - gradapprox)\n\tdenominator = np.linalg.norm(grad) + np.linalg.norm(gradapprox)\n\tdifference = numerator / denominator\n\n\tif difference > 2e-7:\n\t\tprint(\n\t\t\t\"\\033[93m\" + \"There is a mistake in the backward propagation! difference = \" + str(difference) + \"\\033[0m\")\n\telse:\n\t\tprint(\n\t\t\t\"\\033[92m\" + \"Your backward propagation works perfectly fine! difference = \" + str(difference) + \"\\033[0m\")\n\n\treturn difference\n\n\nif __name__ == \"__main__\":\n\tX_data, y_data = load_breast_cancer(return_X_y=True)\n\tX_train, X_test,y_train,y_test = train_test_split(X_data, y_data, train_size=0.8,test_size=0.2,random_state=28)\n\tX_train = X_train.T\n\ty_train = y_train.reshape(y_train.shape[0], -1).T\n\tX_test = X_test.T\n\ty_test = y_test.reshape(y_test.shape[0], -1).T\n\n\t#根据自己实现的bp计算梯度\n\tparameters = initialize_parameters([X_train.shape[0],5,3,1])\n\tAL, caches = forward_propagation(X_train,parameters)\n\tcost = compute_cost(AL,y_train)\n\tgradients = backward_propagation(AL,y_train,caches)\n\t#gradient checking\n\t# # print(X_train.shape[0])\n\tdifference = gradient_check(parameters, gradients, X_train, y_train,[X_train.shape[0],5,3,1])\n"
  },
  {
    "path": "rnn.py",
    "content": "import numpy as np\n\n\ndef initialize_parameters(n_a, n_x, n_y):\n\t\"\"\"\n\tInitialize parameters with small random values\n\tReturns:\n\tparameters -- python dictionary containing:\n\t\t\t\t\t\tWax -- Weight matrix multiplying the input, of shape (n_a, n_x)\n\t\t\t\t\t\tWaa -- Weight matrix multiplying the hidden state, of shape (n_a, n_a)\n\t\t\t\t\t\tWya -- Weight matrix relating the hidden-state to the output, of shape (n_y, n_a)\n\t\t\t\t\t\tb --  Bias, numpy array of shape (n_a, 1)\n\t\t\t\t\t\tby -- Bias relating the hidden-state to the output, of shape (n_y, 1)\n\t\"\"\"\n\tnp.random.seed(1)\n\tWax = np.random.randn(n_a, n_x) * 0.01  # input to hidden\n\tWaa = np.random.randn(n_a, n_a) * 0.01  # hidden to hidden\n\tWya = np.random.randn(n_y, n_a) * 0.01  # hidden to output\n\tba = np.zeros((n_a, 1))  # hidden bias\n\tby = np.zeros((n_y, 1))  # output bias\n\tparameters = {\"Wax\": Wax, \"Waa\": Waa, \"Wya\": Wya, \"ba\": ba, \"by\": by}\n\n\treturn parameters\n\n\ndef softmax(x):\n\t#这里减去最大值，是为了防止大数溢出，根据softmax的参数冗余性，减去任意一个数，结果不变\n\te_x = np.exp(x - np.max(x))\n\treturn e_x / np.sum(e_x, axis=0)\n\n\ndef rnn_step_forward(xt, a_prev, parameters):\n\t\"\"\"\n\tImplements a single forward step of the RNN-cell that uses a tanh\n    activation function\n\n\tArguments:\n\txt -- the input data at timestep \"t\", of shape (n_x, m).\n\ta_prev -- Hidden state at timestep \"t-1\", of shape (n_a, m)\n\t**here, n_x denotes the dimension of word vector, n_a denotes the number of hidden units in a RNN cell\n\tparameters -- python dictionary containing:\n\t\t\t\t\t\tWax -- Weight matrix multiplying the input, of shape (n_a, n_x)\n\t\t\t\t\t\tWaa -- Weight matrix multiplying the hidden state, of shape (n_a, n_a)\n\t\t\t\t\t\tWya -- Weight matrix relating the hidden-state to the output, of shape (n_y, n_a)\n\t\t\t\t\t\tba --  Bias,of shape (n_a, 1)\n\t\t\t\t\t\tby -- Bias relating the hidden-state to the output,of shape (n_y, 1)\n\tReturns:\n\ta_next -- next hidden state, of shape (n_a, 1)\n\tyt_pred -- prediction at timestep \"t\", of shape (n_y, 1)\n\t\"\"\"\n\n\t# get parameters from \"parameters\"\n\tWax = parameters[\"Wax\"] #(n_a, n_x)\n\tWaa = parameters[\"Waa\"] #(n_a, n_a)\n\tWya = parameters[\"Wya\"] #(n_y, n_a)\n\tba = parameters[\"ba\"]   #(n_a, 1)\n\tby = parameters[\"by\"]   #(n_y, 1)\n\n\ta_next = np.tanh(np.dot(Waa, a_prev) + np.dot(Wax, xt) + ba) #(n_a, 1)\n\tyt_pred = softmax(np.dot(Wya, a_next) + by) #(n_y,1)\n\n\treturn a_next, yt_pred\n\n\ndef rnn_forward(X, Y, a0, parameters, vocab_size=27):\n\tx, a, y_hat = {}, {}, {}\n\ta[-1] = np.copy(a0)\n\t# initialize your loss to 0\n\tloss = 0\n\n\tfor t in range(len(X)):\n\t\t# Set x[t] to be the one-hot vector representation of the t'th character in X.\n\t\t# if X[t] == None, we just have x[t]=0. This is used to set the input for the first timestep to the zero vector.\n\t\tx[t] = np.zeros((vocab_size, 1))\n\t\tif (X[t] != None):\n\t\t\tx[t][X[t]] = 1\n\t\t# Run one step forward of the RNN\n\t\ta[t], y_hat[t] = rnn_step_forward(x[t], a[t - 1], parameters) #a[t]: (n_a,1), y_hat[t]:(n_y,1)\n\t\t# Update the loss by substracting the cross-entropy term of this time-step from it.\n\t\t#这里因为真实的label也是采用onehot表示的，因此只要把label向量中1对应的概率拿出来就行了\n\t\tloss -= np.log(y_hat[t][Y[t], 0])\n\n\tcache = (y_hat, a, x)\n\n\treturn loss, cache\n\n\ndef rnn_step_backward(dy, gradients, parameters, x, a, a_prev):\n\n\tgradients['dWya'] += np.dot(dy, a.T)\n\tgradients['dby'] += dy\n\tWya = parameters['Wya']\n\tWaa = parameters['Waa']\n\tda = np.dot(Wya.T, dy) + gradients['da_next']  #每个cell的Upstream有两条，一条da_next过来的，一条y_hat过来的\n\tdtanh = (1 - a * a) * da  # backprop through tanh nonlinearity\n\tgradients['dba'] += dtanh\n\tgradients['dWax'] += np.dot(dtanh, x.T)\n\tgradients['dWaa'] += np.dot(dtanh, a_prev.T)\n\tgradients['da_next'] = np.dot(Waa.T, dtanh)\n\n\treturn gradients\n\n\ndef rnn_backward(X, Y, parameters, cache):\n\t# Initialize gradients as an empty dictionary\n\tgradients = {}\n\n\t# Retrieve from cache and parameters\n\t(y_hat, a, x) = cache\n\tWaa, Wax, Wya, by, ba = parameters['Waa'], parameters['Wax'], parameters['Wya'], parameters['by'], parameters['ba']\n\n\t# each one should be initialized to zeros of the same dimension as its corresponding parameter\n\tgradients['dWax'], gradients['dWaa'], gradients['dWya'] = np.zeros_like(Wax), np.zeros_like(Waa), np.zeros_like(Wya)\n\tgradients['dba'], gradients['dby'] = np.zeros_like(ba), np.zeros_like(by)\n\tgradients['da_next'] = np.zeros_like(a[0])\n\n\t# Backpropagate through time\n\tfor t in reversed(range(len(X))):\n\t\tdy = np.copy(y_hat[t])\n\t\tdy[Y[t]] -= 1 #计算y_hat - y,即预测值-真实值，因为真实值是one-hot向量，只有1个1，其它都是0，\n\t\t# 所以只要在预测值（向量）对应位置减去1即可，其它位置减去0相当于没变\n\t\tgradients = rnn_step_backward(dy, gradients, parameters, x[t], a[t], a[t - 1])\n\n\treturn gradients, a\n\n#梯度裁剪\ndef clip(gradients, maxValue):\n\t'''\n\tClips the gradients' values between minimum and maximum.\n\n\tArguments:\n\tgradients -- a dictionary containing the gradients \"dWaa\", \"dWax\", \"dWya\", \"db\", \"dby\"\n\tmaxValue -- everything above this number is set to this number, and everything less than -maxValue is set to -maxValue\n\n\tReturns:\n\tgradients -- a dictionary with the clipped gradients.\n\t'''\n\n\tdWaa, dWax, dWya, db, dby = gradients['dWaa'], gradients['dWax'], gradients['dWya'], gradients['dba'], gradients['dby']\n\n\t# clip to mitigate exploding gradients, loop over [dWax, dWaa, dWya, db, dby].\n\tfor gradient in [dWax, dWaa, dWya, db, dby]:\n\t\tnp.clip(gradient, -maxValue, maxValue, gradient)\n\n\tgradients = {\"dWaa\": dWaa, \"dWax\": dWax, \"dWya\": dWya, \"dba\": db, \"dby\": dby}\n\n\treturn gradients\n\n\n\ndef update_parameters(parameters, gradients, lr):\n\tparameters['Wax'] += -lr * gradients['dWax']\n\tparameters['Waa'] += -lr * gradients['dWaa']\n\tparameters['Wya'] += -lr * gradients['dWya']\n\tparameters['ba']  += -lr * gradients['dba']\n\tparameters['by']  += -lr * gradients['dby']\n\treturn parameters\n\n\ndef sample(parameters, char_to_ix, seed):\n\t\"\"\"\n\tSample a sequence of characters according to a sequence of probability distributions output of the RNN\n\tArguments:\n\tparameters -- python dictionary containing the parameters Waa, Wax, Wya, by, and b.\n\tchar_to_ix -- python dictionary mapping each character to an index.\n\tseed -- used for grading purposes. Do not worry about it.\n\tReturns:\n\tindices -- a list of length n containing the indices of the sampled characters.\n\t\"\"\"\n\n\t# Retrieve parameters and relevant shapes from \"parameters\" dictionary\n\tWaa, Wax, Wya, by, ba = parameters['Waa'], parameters['Wax'], parameters['Wya'], parameters['by'], parameters['ba']\n\tvocab_size = by.shape[0]\n\tn_a = Waa.shape[1]\n\t# Step 1: Create the one-hot vector x for the first character (initializing the sequence generation).\n\tx = np.zeros((vocab_size, 1))\n\t# Step 1': Initialize a_prev as zeros\n\ta_prev = np.zeros((n_a, 1))\n\n\t# Create an empty list of indices, this is the list which will contain the list of indices of the characters to generate\n\tindices = []\n\n\t# Idx is a flag to detect a newline character, we initialize it to -1\n\tidx = -1\n\n\t# Loop over time-steps t. At each time-step, sample a character from a probability distribution and append\n\t# its index to \"indices\". We'll stop if we reach 50 characters (which should be very unlikely with a well\n\t# trained model), which helps debugging and prevents entering an infinite loop.\n\tcounter = 0\n\tnewline_character = char_to_ix['\\n']\n\n\twhile (idx != newline_character and counter != 50):\n\t\t# Step 2: Forward propagate x using the equations (1), (2) and (3)\n\t\ta = np.tanh(np.dot(Wax, x) + np.dot(Waa, a_prev) + ba)\n\t\tz = np.dot(Wya, a) + by\n\t\ty = softmax(z)\n\n\t\t# for grading purposes\n\t\tnp.random.seed(counter + seed)\n\n\t\t# Step 3: Sample the index of a character within the vocabulary from the probability distribution y\n\t\tidx = np.random.choice(vocab_size, p = y.ravel())  # 等价于np.random.choice([0,1,...,vocab_size-1], p = y.ravel())，\n\t\t# 一维数组或者int型变量，如果是数组，就按照里面的范围来进行采样，如果是单个变量，则采用np.arange(a)的形式\n\t\t# Append the index to \"indices\"\n\t\tindices.append(idx)\n\t\t# Step 4: Overwrite the input character as the one corresponding to the sampled index.\n\t\t#每次生成的字符是下一个时间步的输入\n\t\tx = np.zeros((vocab_size, 1))\n\t\tx[idx] = 1\n\n\t\t# Update \"a_prev\" to be \"a\"\n\t\ta_prev = a\n\t\t# for grading purposes\n\t\tseed += 1\n\t\tcounter += 1\n\tif (counter == 50):\n\t\tindices.append(char_to_ix['\\n'])\n\n\treturn indices\n\n\ndef optimize(X, Y, a_prev, parameters, learning_rate=0.01):\n\t\"\"\"\n\tExecute one step of the optimization to train the model.\n\n\tArguments:\n\tX -- list of integers, where each integer is a number that maps to a character in the vocabulary.\n\tY -- list of integers, exactly the same as X but shifted one index to the left.\n\ta_prev -- previous hidden state.\n\tparameters -- python dictionary containing:\n\t\t\t\t\t\tWax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)\n\t\t\t\t\t\tWaa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)\n\t\t\t\t\t\tWya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)\n\t\t\t\t\t\tb --  Bias, numpy array of shape (n_a, 1)\n\t\t\t\t\t\tby -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)\n\tlearning_rate -- learning rate for the model.\n\n\tReturns:\n\tloss -- value of the loss function (cross-entropy)\n\tgradients -- python dictionary containing:\n\t\t\t\t\t\tdWax -- Gradients of input-to-hidden weights, of shape (n_a, n_x)\n\t\t\t\t\t\tdWaa -- Gradients of hidden-to-hidden weights, of shape (n_a, n_a)\n\t\t\t\t\t\tdWya -- Gradients of hidden-to-output weights, of shape (n_y, n_a)\n\t\t\t\t\t\tdb -- Gradients of bias vector, of shape (n_a, 1)\n\t\t\t\t\t\tdby -- Gradients of output bias vector, of shape (n_y, 1)\n\ta[len(X)-1] -- the last hidden state, of shape (n_a, 1)\n\t\"\"\"\n\n\t# Forward propagate through time\n\tloss, cache = rnn_forward(X, Y, a_prev, parameters)\n\n\t# Backpropagate through time\n\tgradients, a = rnn_backward(X, Y, parameters, cache)\n\n\t# Clip your gradients between -5 (min) and 5 (max)\n\tgradients = clip(gradients, 5)\n\n\t# Update parameters\n\tparameters = update_parameters(parameters, gradients, learning_rate)\n\n\treturn loss, parameters, a[len(X) - 1]\n\ndef get_initial_loss(vocab_size, seq_length):\n\treturn -np.log(1.0/vocab_size)*seq_length\n\ndef smooth(loss, cur_loss):\n\treturn loss * 0.999 + cur_loss * 0.001\n\ndef print_sample(sample_ix, ix_to_char):\n\ttxt = ''.join(ix_to_char[ix] for ix in sample_ix)\n\ttxt = txt[0].upper() + txt[1:]  # capitalize first character\n\tprint ('%s' % (txt, ), end='')\n\n\ndef model(data, ix_to_char, char_to_ix, num_iterations=35000, n_a=50, dino_names=7, vocab_size=27):\n\t\"\"\"\n\tTrains the model and generates dinosaur names.\n\n\tArguments:\n\tdata -- text corpus\n\tix_to_char -- dictionary that maps the index to a character\n\tchar_to_ix -- dictionary that maps a character to an index\n\tnum_iterations -- number of iterations to train the model for\n\tn_a -- number of units of the RNN cell\n\tdino_names -- number of dinosaur names you want to sample at each iteration.\n\tvocab_size -- number of unique characters found in the text, size of the vocabulary\n\n\tReturns:\n\tparameters -- learned parameters\n\t\"\"\"\n\n\t# Retrieve n_x and n_y from vocab_size\n\tn_x, n_y = vocab_size, vocab_size\n\n\t# Initialize parameters\n\tparameters = initialize_parameters(n_a, n_x, n_y)\n\t# Initialize loss (this is required because we want to smooth our loss, don't worry about it)\n\tloss = get_initial_loss(vocab_size, dino_names)\n\n\t# Initialize the hidden state of your LSTM\n\ta_prev = np.zeros((n_a, 1))\n\n\t# Optimization loop\n\tfor j in range(num_iterations):\n\n\t\t# Use the hint above to define one training example (X,Y) (≈ 2 lines)\n\t\tindex = j % len(data)\n\t\tX = [None] + [char_to_ix[ch] for ch in data[index]]\n\t\tY = X[1:] + [char_to_ix[\"\\n\"]]\n\t\t# Perform one optimization step: Forward-prop -> Backward-prop -> Clip -> Update parameters\n\t\t# Choose a learning rate of 0.01\n\t\tcurr_loss, parameters, a_prev = optimize(X, Y, a_prev, parameters, learning_rate = 0.01)\n\t\t# Use a latency trick to keep the loss smooth. It happens here to accelerate the training.\n\t\tloss = smooth(loss, curr_loss)\n\t\t# Every 2000 Iteration, generate \"n\" characters thanks to sample() to check if the model is learning properly\n\t\tif j % 2000 == 0:\n\t\t\tprint('Iteration: %d, Loss: %f' % (j, loss) + '\\n')\n\t\t\t# The number of dinosaur names to print\n\t\t\tseed = 0\n\t\t\tfor name in range(dino_names):\n\t\t\t\t# Sample indices and print them\n\t\t\t\tsampled_indices = sample(parameters, char_to_ix, seed)\n\t\t\t\tprint_sample(sampled_indices, ix_to_char)\n\t\t\t\tseed += 1  # To get the same result for grading purposed, increment the seed by one.\n\n\t\t\tprint('\\n')\n\n\treturn parameters\n\n\nif __name__ == \"__main__\":\n\tdata = open('dinos.txt', 'r').read()\n\tdata = data.lower()\n\tchars = list(set(data))  # str->set,例如:'123'转set，会转为无序不重复的，形如:{'1','2','3'}\n\tprint(chars)\n\tdata_size, vocab_size = len(data), len(chars)\n\tprint('There are %d total characters and %d unique characters in your data.' % (data_size, vocab_size))\n\tchar_to_ix = {ch: i for i, ch in enumerate(sorted(chars))}\n\tix_to_char = {i: ch for i, ch in enumerate(sorted(chars))}\n\tprint(ix_to_char)\n\n\t# Build list of all dinosaur names (training examples).\n\twith open(\"dinos.txt\") as f:\n\t\texamples = f.readlines()\n\texamples = [x.lower().strip() for x in examples]\n\n\t# Shuffle list of all dinosaur names\n\tnp.random.seed(0)\n\tnp.random.shuffle(examples)\n\n\tparameters = model(examples, ix_to_char, char_to_ix)"
  }
]