[
  {
    "path": "README.md",
    "content": "# SimpleTPU\n\nA Tensor Processing Unit is designed to accelerate the matrix multiplication, especially for Multilayer perceptron and Convolution Nerual Network.    \nThis implmentaion is mainly following the Google TPU Version 1, which architecture is introduced in [https://arxiv.org/ftp/arxiv/papers/1704/1704.04760.pdf](https://arxiv.org/ftp/arxiv/papers/1704/1704.04760.pdf \"In-Datacenter Performance Analysis of a Tensor Processing Unit\").\n\nIt may cost a lot of time to implementation TPU using Hardware Description Language (such as VHDL or Verilog HDL), even if I had tried to simplify it. So I try to use the Xilinx HLS ToolKit to complete it. \n\nThe plan is divided into three phases.\n\n- Phase 1: Completing the main computing module,including\n    - Lab1:Systolic Array\n    - Lab2:Relu, Normalization & Pooling \n- Phase 2: Finish the full design of simpleTPU.\n- Phase 3: Testing the simpleTPU through some real network, such as MLP and CNN.\n\n# Key Features\n\nThe key features of Simple TPU including\n- Int8 mulitply & Int32 accumulators\n- VLIW based instruction parallel\n- Vector Architecture based data parallel\n\nHere are some operate which Simple TPU can support. \n\nOperate | Support\n-|-\nConv3d | in_channels: Resource Constrained  <br> out_channels: Resource Constrained<br>kerner_size: Support<br>stride: support<br>padding: Support<br>dilation:Support<br>groups: Architecture Constrained<br>bias    :Support\nConvTranspose3d | The same as above\nMaxpool2d | kernel_size: Support <br>stride: Support<br>padding: Support    \nAvgpool2d | The same as above\nRelu | Only support Relu as nonlinear function\nBatchNorm2d | BatchNorm2d is merge with Conv or Pool when inference\nLinear | Resource Constrained \nUpscalingNearest2D | Support (calling Avgpool2d multiple times.)\nUpscalingBilinear2D | Support (calling Avgpool2d multiple times.)\n\n\n# Performance\nThe size of mac array in SimpleTPU is 32*32, the clock frequency is 500MHz (timing closure when using Xilinx ultrascale+ FPGA, Speed -2).  \n$$32\\times 32 \\times 500 \\times 2 = 1 Tops(int8)$$\n\n# Installation\n **env** :   \n - Vivado HLS 2018.2\n\n **run** :  \n - step1: `vivado_hls -f run_hls.tcl`\n - step2: lanch vivado HLS and open the project  \n - step3: Run C synthesis, C/RTL cosimulation e.t.c\n\n**Synthesis Result**:    \n![result](./pictures/syn.png)    \n**Simulation Result**:    \n![result](./pictures/sim.png)\n# Examlpes\n## 1. MLP\nThe network structure of mlp is defined as follow.\n```\nclass MLP(nn.Module):\n    def __init__(self):\n        super(MLP, self).__init__()\n        self.hidden = nn.Linear(784,64)\n        self.fc = nn.Linear(64,10)\n\n    def forward(self, x):\n        x = x.view(-1,784)\n        x = self.hidden(x)\n        x = self.fc(x)\n        return F.log_softmax(x, dim=1)\n```\n\nWork efficiency of SimpleTPU is about 84%.\n\n\n|LOC| Layers | Nonlinear function | Weights | Batch Size | % of Deployed|\n|---|---|---|----|----|----|\n|10 | 2 FC | Relu | 5M | 512 | 16%|\n\nClassfication Result in MNIST.\n\n![result](./pictures/cla_result.png)\n## 2. CNN\nBecause there is no compiler to generate instruction, this plan was suspended.\nIf you want to kown how to calculate convolution using SimpleTPU, lab1  provides a simple example.\n\n\n# Relative Link  \nhttps://www.cnblogs.com/sea-wind/p/10993958.html\n"
  },
  {
    "path": "data/golden_result.txt",
    "content": "7\n2\n1\n0\n4\n1\n4\n9\n6\n9\n0\n6\n9\n0\n1\n5\n9\n7\n6\n4\n9\n6\n6\n5\n4\n0\n7\n4\n0\n1\n3\n1\n3\n6\n7\n2\n7\n1\n2\n1\n1\n7\n4\n2\n6\n5\n1\n2\n4\n4\n6\n3\n5\n5\n6\n0\n4\n1\n9\n5\n7\n8\n4\n2\n7\n4\n6\n4\n3\n0\n7\n0\n2\n9\n1\n7\n3\n7\n9\n7\n9\n6\n2\n7\n8\n4\n7\n5\n6\n1\n3\n6\n4\n3\n1\n4\n1\n1\n6\n9\n6\n0\n5\n4\n9\n9\n2\n1\n4\n4\n8\n1\n3\n9\n7\n4\n4\n4\n9\n2\n5\n4\n7\n6\n4\n9\n0\n5\n8\n5\n6\n6\n5\n2\n8\n1\n0\n1\n6\n4\n6\n7\n3\n1\n9\n1\n8\n2\n0\n9\n9\n9\n5\n5\n1\n5\n6\n0\n3\n4\n4\n6\n5\n4\n6\n5\n4\n5\n1\n4\n4\n7\n2\n3\n2\n1\n1\n8\n1\n8\n1\n8\n5\n0\n8\n9\n2\n5\n0\n1\n1\n1\n0\n4\n0\n5\n1\n6\n4\n2\n3\n6\n1\n1\n1\n3\n9\n5\n2\n9\n4\n5\n9\n3\n9\n0\n3\n6\n5\n5\n7\n2\n2\n7\n1\n2\n8\n4\n1\n7\n3\n3\n8\n9\n7\n9\n2\n2\n4\n1\n5\n8\n8\n4\n2\n6\n0\n6\n4\n2\n4\n1\n9\n5\n7\n7\n2\n8\n2\n0\n8\n1\n7\n7\n9\n1\n8\n1\n8\n0\n3\n0\n1\n9\n9\n4\n1\n8\n2\n1\n2\n9\n7\n5\n9\n2\n6\n4\n1\n5\n4\n2\n9\n2\n0\n4\n0\n0\n2\n8\n6\n2\n1\n2\n4\n0\n2\n9\n4\n3\n3\n0\n0\n5\n1\n9\n6\n4\n0\n5\n1\n7\n9\n3\n0\n4\n2\n0\n7\n1\n1\n2\n1\n5\n3\n3\n4\n7\n8\n6\n6\n4\n1\n3\n5\n1\n0\n5\n1\n9\n1\n5\n0\n6\n1\n8\n5\n1\n9\n4\n4\n6\n7\n1\n5\n0\n6\n5\n6\n3\n7\n2\n0\n8\n8\n5\n4\n1\n1\n4\n0\n7\n3\n7\n6\n1\n6\n2\n1\n4\n2\n8\n6\n1\n9\n5\n2\n5\n4\n4\n2\n8\n3\n9\n2\n4\n6\n0\n3\n1\n7\n7\n3\n7\n9\n7\n1\n9\n2\n1\n4\n2\n9\n2\n0\n4\n9\n1\n4\n8\n1\n8\n4\n4\n9\n8\n8\n3\n7\n6\n0\n0\n3\n0\n8\n0\n6\n4\n8\n5\n3\n3\n2\n3\n9\n1\n2\n6\n8\n0\n5\n6\n6\n6\n9\n8\n8\n2\n2\n5\n8\n9\n6\n1\n8\n4\n1\n2\n8\n3\n1\n9\n7\n5\n4\n0\n8\n9\n9\n1\n0\n5\n2\n3\n7\n8\n9\n4\n0\n6\n3\n9\n1\n2\n1\n8\n1\n5\n6\n5\n2\n1\n"
  },
  {
    "path": "lab1/README.md",
    "content": "# Systolic Array \n\nSystolic Array implement in FPGA using Xilinx HLS.\n\n## 1.Env & Build  \n **env** :   \n - Vivado HLS 2018.2 or 2016.3 , MATLAB 2014a(for matlabcode)  \n \n **run** :  \n - step1: `vivado_hls -f run_hls.tcl`\n - step2: lanch vivado HLS and open the project  \n - step3: Run C synthesis, C/RTL cosimulation e.t.c\n\n## 2.Relative Link  \nhttps://www.cnblogs.com/sea-wind/p/10995360.html"
  },
  {
    "path": "lab1/refcode/conv3d.m",
    "content": "\nrng(0);\nfeature = randi([-128,127],14,14,32);\nweight = randi([-128,127],32,3,3,32);\nbias = randi([-1024,1023],1,32);\noutput = zeros(14,14,32);\n\nsaveparam(feature,weight,bias)\n\nout1 = convmxu(weight,feature,bias,2,2);\nout2 = convmxu(weight,feature,zeros(1,32),1,1);\nout3 = convmxu(weight,feature,zeros(1,32),1,2);\nout4 = convmxu(weight,feature,zeros(1,32),1,3);\nout5 = convmxu(weight,feature,zeros(1,32),2,1);\nout6 = convmxu(weight,feature,zeros(1,32),2,3);\nout7 = convmxu(weight,feature,zeros(1,32),3,1);\nout8 = convmxu(weight,feature,zeros(1,32),3,2);\nout9 = convmxu(weight,feature,zeros(1,32),3,3);\n\noutput = out1;\noutput(2:end,2:end,:) = output(2:end,2:end,:) + out2(1:end-1,1:end-1,:);\noutput(2:end,:,:) = output(2:end,:,:) + out3(1:end-1,:,:);\noutput(2:end,1:end-1,:) = output(2:end,1:end-1,:) + out4(1:end-1,2:end,:);\noutput(:,2:end,:) = output(:,2:end,:) + out5(:,1:end-1,:);\noutput(:,1:end-1,:) = output(:,1:end-1,:) + out6(:,2:end,:);\noutput(1:end-1,2:end,:) = output(1:end-1,2:end,:) + out7(2:end,1:end-1,:);\noutput(1:end-1,:,:) = output(1:end-1,:,:) + out8(2:end,:,:);\noutput(1:end-1,1:end-1,:) = output(1:end-1,1:end-1,:) + out9(2:end,2:end,:);\n\ngolden = zeros(14,14,32);\nfor k = 1:32\n   wk = reshape(weight(k,:,:,:),3,3,32);\n   wk = wk(end:-1:1,end:-1:1,end:-1:1);\n   tmp = convn(feature,wk,'same');\n   golden(:,:,k) = tmp(:,:,16)+bias(k);\nend\ngolden = int32(golden);\nfid = fopen('golden.dat','wb');\nfor i=1:14\n    for j=1:14\n        fwrite(fid,golden(i,j,:),'int32');\n    end\nend\nfclose(fid);"
  },
  {
    "path": "lab1/refcode/convmxu.m",
    "content": "function [out1] = convmxu(weight,feature,bias,index1,index2)\n%UNTITLED3 Summary of this function goes here\n%   Detailed explanation goes here\n\nout1 = zeros(14,14,32);\nfor i = 1:14\n    for j = 1:14\n        for k = 1:32\n            for c = 1:32\n                if(c==1)\n                    out1(i,j,k) = bias(k) + weight(k,index1,index2,c)*feature(i,j,c);\n                else\n                    out1(i,j,k) = out1(i,j,k) + weight(k,index1,index2,c)*feature(i,j,c);\n                end\n            end\n        end\n    end\nend\n\nend\n\n"
  },
  {
    "path": "lab1/refcode/saveparam.m",
    "content": "function [] = saveparam(feature,weight,bias)\n%UNTITLED2 Summary of this function goes here\n%   Detailed explanation goes here\n\nfeature = int8(feature);\nweight = int8(weight);\nbias = int32(bias);\nbias4 = bitand(bitshift(bias,-24),int32(255));\nbias3 = bitand(bitshift(bias,-16),int32(255));\nbias2 = bitand(bitshift(bias,-8),int32(255));\nbias1 = bitand(bias,int32(255));\nfid = fopen('feature.dat','wb');\nfor i=1:14\n    for j=1:14\n        fwrite(fid,feature(i,j,:),'int8');\n    end\nend\nfclose(fid);\n\nfid = fopen('weight.dat','wb');\nfor k=1:32\n    fwrite(fid,weight(:,2,2,k),'int8');\nend\nfwrite(fid,uint8(bias4),'uint8');\nfwrite(fid,uint8(bias3),'uint8');\nfwrite(fid,uint8(bias2),'uint8');\nfwrite(fid,uint8(bias1),'uint8');\nfor i=1:3\n    for j=1:3\n        for k=1:32\n            if(~(i==2&&j==2))\n                fwrite(fid,weight(:,i,j,k),'int8');\n            end\n        end\n        if(~(i==2&&j==2))\n            for k=1:32\n                fwrite(fid,0,'int32');\n            end\n        end\n    end\nend\nfclose(fid);\n\nend\n\n"
  },
  {
    "path": "lab1/run_hls.tcl",
    "content": "open_project -reset mxu_conv_prj\nset_top MXU\nadd_files src/tpu.h\nadd_files src/mxu.cpp\nadd_files -tb data/feature.dat\nadd_files -tb data/golden.dat\nadd_files -tb data/weight.dat\nadd_files -tb src/tb_mxu.cpp\n\nopen_solution -reset \"solution1\"\nset_part {xczu7cg-fbvb900-2-i} -tool vivado\ncreate_clock -period 2.5 -name default\n\ncsim_design\n# Do not perform any other steps\n# - The basic project will be opened in the GUI \nexit"
  },
  {
    "path": "lab1/src/mxu.cpp",
    "content": "\n#include \"tpu.h\"\n\nvoid SetWeight(WEIGHTDTYPE weight[512][MXU_COLNUM],WEIGHTDTYPE weightreg[MXU_ROWNUM+4][MXU_COLNUM],\n\t\tshort weight_raddr, bool enable){\n\tif(!enable)\n\t\treturn;\n\tfor(short i=weight_raddr;i<weight_raddr+4+MXU_ROWNUM;i++){\n#pragma HLS PIPELINE\n\t\tfor(int j=0;j<MXU_ROWNUM+4;j++){\n\t\t\tfor(int k=0;k<MXU_COLNUM;k++){\n\t\t\t\tif(j!=MXU_ROWNUM+3)\n\t\t\t\t\tweightreg[j][k] = weightreg[j+1][k];\n\t\t\t\telse\n\t\t\t\t\tweightreg[j][k] = weight[i][k];\n\t\t\t}\n\t\t}\n\t}\n}\n\nvoid MacArray(FEATDTYPE ubuf[16384][MXU_ROWNUM],WEIGHTDTYPE weightreg[4+MXU_ROWNUM][MXU_COLNUM],\n\t\tPSUMDTYPE psum[512][MXU_COLNUM],MXU_PARAM mxuparam,bool enable){\n\tif(!enable)\n\t\treturn;\n\tFEATDTYPE featreg[MXU_ROWNUM][MXU_COLNUM+MXU_ROWNUM-1];\n\tPSUMDTYPE psumreg[MXU_ROWNUM][MXU_COLNUM];\n    short ubuf_raddr_p1=0;\n    short ubuf_raddr_p2=0;\n    short ubuf_raddr_p3=0;\n    short psum_addr_p1[MXU_COLNUM];\n    short psum_addr_p2[MXU_COLNUM];\n    for(int i=0;i<MXU_COLNUM;i++){\n#pragma HLS UNROLL\n    \tpsum_addr_p1[i] = 0;\n    \tpsum_addr_p2[i] = 0;\n    }\n\tfor(short i=0;i<mxuparam.ubuf_raddr_num+MXU_ROWNUM+MXU_COLNUM-2;i++){\n#pragma HLS PIPELINE\n    short ubuf_raddr = mxuparam.ubuf_raddr_start + ubuf_raddr_p1 + ubuf_raddr_p2 + ubuf_raddr_p3;\n    if(ubuf_raddr_p1==mxuparam.ubuf_raddr_end1){\n        ubuf_raddr_p1 = 0;\n        if(ubuf_raddr_p2==mxuparam.ubuf_raddr_end2){\n            ubuf_raddr_p2 = 0;\n            ubuf_raddr_p3 = ubuf_raddr_p3 +  mxuparam.ubuf_raddr_step3;\n        }\n        else{\n            ubuf_raddr_p2 = ubuf_raddr_p2 +  mxuparam.ubuf_raddr_step2;\n        }\n    }\n    else{\n        ubuf_raddr_p1 = ubuf_raddr_p1 + mxuparam.ubuf_raddr_step1;\n    }\n\n\t\tfor(int j=0;j<MXU_ROWNUM;j++){\n\t\t\tfor(int k=MXU_ROWNUM+MXU_COLNUM-2;k>=0;k--){\n\t\t\t\tif(k>0)\n\t\t\t\t\tfeatreg[j][k] = featreg[j][k-1];\n\t\t\t\telse\n\t\t\t\t\tif(i<mxuparam.ubuf_raddr_num)\n\t\t\t\t\t\tfeatreg[j][k] = ubuf[ubuf_raddr][j];\n\t\t\t\t\telse\n\t\t\t\t\t\tfeatreg[j][k] = 0;\n\t\t\t}\n\t\t}\n\n\t\tfor(int j=MXU_ROWNUM-1;j>=0;j--){\n\t\t\tfor(int k=0;k<MXU_COLNUM;k++){\n\t\t\t\tap_int<32> biasreg;\n\t\t\t\tbiasreg(31,24)=weightreg[MXU_ROWNUM+0][k];\n\t\t\t\tbiasreg(23,16)=weightreg[MXU_ROWNUM+1][k];\n\t\t\t\tbiasreg(15, 8)=weightreg[MXU_ROWNUM+2][k];\n\t\t\t\tbiasreg( 7, 0)=weightreg[MXU_ROWNUM+3][k];\n\t\t\t\tif(j==0)\n\t\t\t\t\tpsumreg[j][k] = featreg[j][k+j]*weightreg[j][k] + biasreg;\n\t\t\t\telse\n\t\t\t\t\tpsumreg[j][k] = featreg[j][k+j]*weightreg[j][k] + psumreg[j-1][k];\n\t\t\t}\n\t\t}\n#pragma HLS DEPENDENCE variable=psum inter false\n#pragma HLS DEPENDENCE variable=psum intra false\n\t\tfor(int j=0;j<MXU_COLNUM;j++){\n\t\t\tif(i>=j+MXU_ROWNUM-1&&i<mxuparam.ubuf_raddr_num+j+MXU_ROWNUM-1){\n\t\t\t\tshort psum_raddr = mxuparam.psum_start + psum_addr_p1[j] + psum_addr_p2[j];\n\t\t\t\tif(psum_addr_p1[j]==mxuparam.psum_end1){\n\t\t\t\t\tpsum_addr_p1[j] = 0;\n\t\t\t\t\tpsum_addr_p2[j] = psum_addr_p2[j] + mxuparam.psum_step2;\n\t\t\t\t}\n\t\t\t\telse{\n\t\t\t\t\tpsum_addr_p1[j] = psum_addr_p1[j] + mxuparam.psum_step1;\n\t\t\t\t}\n\t\t\t\tif(mxuparam.isfirstpsum)\n\t\t\t\t\tpsum[psum_raddr][j] = psumreg[MXU_ROWNUM-1][j];\n\t\t\t\telse\n\t\t\t\t\tpsum[psum_raddr][j] = psumreg[MXU_ROWNUM-1][j] + psum[psum_raddr][j];\n\t\t\t}\n\t\t}\n\t}\n}\n\nvoid MXU(FEATDTYPE ubuf[16384][MXU_ROWNUM],WEIGHTDTYPE weight[512][MXU_COLNUM],PSUMDTYPE psum[512][MXU_COLNUM],MXU_PARAM mxuparam){\n#pragma HLS INTERFACE bram port=ubuf\n#pragma HLS INTERFACE bram port=weight\n#pragma HLS INTERFACE bram port=psum\n#pragma HLS DATA_PACK variable=mxuparam\n#pragma HLS ARRAY_PARTITION variable=ubuf complete dim=2\n#pragma HLS ARRAY_PARTITION variable=weight complete dim=2\n#pragma HLS ARRAY_PARTITION variable=psum complete dim=2\n\n\tstatic WEIGHTDTYPE weightreg1[4+MXU_ROWNUM][MXU_COLNUM];\n\tstatic WEIGHTDTYPE weightreg2[4+MXU_ROWNUM][MXU_COLNUM];\n#pragma HLS ARRAY_PARTITION variable=weightreg1 complete dim=0\n#pragma HLS ARRAY_PARTITION variable=weightreg2 complete dim=0\n\n\tif(mxuparam.isping){\n\t\tSetWeight(weight,weightreg1,mxuparam.weight_raddr,mxuparam.isload);\n\t\tMacArray(ubuf,weightreg2,psum,mxuparam,mxuparam.iscalc);\n\t}\n\telse{\n\t\tSetWeight(weight,weightreg2,mxuparam.weight_raddr,mxuparam.isload);\n\t\tMacArray(ubuf,weightreg1,psum,mxuparam,mxuparam.iscalc);\n\t}\n}\n"
  },
  {
    "path": "lab1/src/tb_mxu.cpp",
    "content": "#include \"tpu.h\"\n#include \"stdio.h\"\n#include \"stdlib.h\"\n\nint main(){\n\tFEATDTYPE ubuf[16384][MXU_ROWNUM];\n\tWEIGHTDTYPE weight[512][MXU_COLNUM];\n\tint psum[512][MXU_COLNUM];\n\tFILE *fid;\n\tfid = fopen(\"feature.dat\",\"rb\");\n\tfor(int i=0;i<14*14;i++){\n\t\tfor(int j=0;j<32;j++){\n\t\t\tchar a;\n\t\t\tfread(&a,sizeof(char),1,fid);\n\t\t\tubuf[i][j] = a;\n\t\t}\n\t}\n\tfclose(fid);\n\tfid = fopen(\"weight.dat\",\"rb\");\n\tfor(int i=0;i<3*3*32+32;i++){\n\t\tfor(int j=0;j<32;j++){\n\t\t\tchar a;\n\t\t\tfread(&a,sizeof(char),1,fid);\n\t\t\tweight[i][j] = a;\n\t\t}\n\t}\n\tfclose(fid);\n\tMXU_PARAM mxuparam;\n\tstruct MXU_PARAM{\n\t\tbool isload;\n\t\tbool iscalc;\n\t\tbool isping;\n\t\tbool isfirstpsum;\n\n\t\tshort weight_raddr;\n\t\tshort ubuf_raddr_start;\n\t\tshort ubuf_raddr_step1;\n\t\tshort ubuf_raddr_step2;\n\t\tshort ubuf_raddr_step3;\n\t\tshort ubuf_raddr_end1;\n\t\tshort ubuf_raddr_end2;\n\t\tshort ubuf_raddr_end3;\n\t\tshort ubuf_raddr_num;\n\t\tshort psum_start;\n\n\t};\n\n\tmxuparam.isload = true;\n\tmxuparam.iscalc = false;\n\tmxuparam.isping = true;\n\tmxuparam.weight_raddr = 0;\n\tMXU(ubuf,weight,psum,mxuparam);\n// 2,2\n\tmxuparam.weight_raddr = 36;\n\tmxuparam.iscalc = true;\n\tmxuparam.isping = false;\n\tmxuparam.isfirstpsum = true;\n\tmxuparam.ubuf_raddr_start = 0;\n\tmxuparam.ubuf_raddr_num = 14*14;\n\tmxuparam.ubuf_raddr_step1 = 1;\n\tmxuparam.ubuf_raddr_step2 = 14;\n\tmxuparam.ubuf_raddr_step3 = 1;\n\tmxuparam.ubuf_raddr_end1 = 14-1;\n\tmxuparam.ubuf_raddr_end2 = 14-1;\n\tmxuparam.psum_start = 0;\n\tmxuparam.psum_step1 = 1;\n\tmxuparam.psum_end1 = 13;\n\tmxuparam.psum_step2 = 14;\n\tMXU(ubuf,weight,psum,mxuparam);\n\n//1,1\n\tmxuparam.weight_raddr = 36*2;\n\tmxuparam.iscalc = true;\n\tmxuparam.isping = true;\n\tmxuparam.isfirstpsum = false;\n\tmxuparam.ubuf_raddr_start = 0;\n\tmxuparam.ubuf_raddr_num = 13*13;\n\tmxuparam.ubuf_raddr_step1 = 1;\n\tmxuparam.ubuf_raddr_step2 = 14;\n\tmxuparam.ubuf_raddr_step3 = 1;\n\tmxuparam.ubuf_raddr_end1 = 13-1;\n\tmxuparam.ubuf_raddr_end2 = 14*13-1;\n\tmxuparam.psum_start = 15;\n\tmxuparam.psum_step1 = 1;\n\tmxuparam.psum_end1 = 12;\n\tmxuparam.psum_step2 = 14;\n\tMXU(ubuf,weight,psum,mxuparam);\n//1,2\n\tmxuparam.weight_raddr = 36*3;\n\tmxuparam.iscalc = true;\n\tmxuparam.isping = false;\n\tmxuparam.isfirstpsum = false;\n\tmxuparam.ubuf_raddr_start = 0;\n\tmxuparam.ubuf_raddr_num = 14*13;\n\tmxuparam.ubuf_raddr_step1 = 1;\n\tmxuparam.ubuf_raddr_step2 = 14;\n\tmxuparam.ubuf_raddr_step3 = 1;\n\tmxuparam.ubuf_raddr_end1 = 14-1;\n\tmxuparam.ubuf_raddr_end2 = 14*13-1;\n\tmxuparam.psum_start = 14;\n\tmxuparam.psum_step1 = 1;\n\tmxuparam.psum_end1 = 13;\n\tmxuparam.psum_step2 = 14;\n\tMXU(ubuf,weight,psum,mxuparam);\n//1,3\n\tmxuparam.weight_raddr = 36*4;\n\tmxuparam.iscalc = true;\n\tmxuparam.isping = !mxuparam.isping;\n\tmxuparam.isfirstpsum = false;\n\tmxuparam.ubuf_raddr_start = 1;\n\tmxuparam.ubuf_raddr_num = 13*13;\n\tmxuparam.ubuf_raddr_step1 = 1;\n\tmxuparam.ubuf_raddr_step2 = 14;\n\tmxuparam.ubuf_raddr_step3 = 1;\n\tmxuparam.ubuf_raddr_end1 = 13-1;\n\tmxuparam.ubuf_raddr_end2 = 13*13-1;\n\tmxuparam.psum_start = 14;\n\tmxuparam.psum_step1 = 1;\n\tmxuparam.psum_end1 = 12;\n\tmxuparam.psum_step2 = 14;\n\tMXU(ubuf,weight,psum,mxuparam);\n//2,1\n\tmxuparam.weight_raddr = 36*5;\n\tmxuparam.iscalc = true;\n\tmxuparam.isping = !mxuparam.isping;\n\tmxuparam.isfirstpsum = false;\n\tmxuparam.ubuf_raddr_start = 0;\n\tmxuparam.ubuf_raddr_num = 14*13;\n\tmxuparam.ubuf_raddr_step1 = 1;\n\tmxuparam.ubuf_raddr_step2 = 14;\n\tmxuparam.ubuf_raddr_step3 = 1;\n\tmxuparam.ubuf_raddr_end1 = 13-1;\n\tmxuparam.ubuf_raddr_end2 = 14*13-1;\n\tmxuparam.psum_start = 1;\n\tmxuparam.psum_step1 = 1;\n\tmxuparam.psum_end1 = 12;\n\tmxuparam.psum_step2 = 14;\n\tMXU(ubuf,weight,psum,mxuparam);\n\n\n//2,3\n\tmxuparam.weight_raddr = 36*6;\n\tmxuparam.iscalc = true;\n\tmxuparam.isping = !mxuparam.isping;\n\tmxuparam.isfirstpsum = false;\n\tmxuparam.ubuf_raddr_start = 1;\n\tmxuparam.ubuf_raddr_num = 14*13;\n\tmxuparam.ubuf_raddr_step1 = 1;\n\tmxuparam.ubuf_raddr_step2 = 14;\n\tmxuparam.ubuf_raddr_step3 = 1;\n\tmxuparam.ubuf_raddr_end1 = 13-1;\n\tmxuparam.ubuf_raddr_end2 = 14*13-1;\n\tmxuparam.psum_start = 0;\n\tmxuparam.psum_step1 = 1;\n\tmxuparam.psum_end1 = 12;\n\tmxuparam.psum_step2 = 14;\n\tMXU(ubuf,weight,psum,mxuparam);\n\n//3,1\n\tmxuparam.weight_raddr = 36*7;\n\tmxuparam.iscalc = true;\n\tmxuparam.isping = !mxuparam.isping;\n\tmxuparam.isfirstpsum = false;\n\tmxuparam.ubuf_raddr_start = 14;\n\tmxuparam.ubuf_raddr_num = 13*13;\n\tmxuparam.ubuf_raddr_step1 = 1;\n\tmxuparam.ubuf_raddr_step2 = 14;\n\tmxuparam.ubuf_raddr_step3 = 1;\n\tmxuparam.ubuf_raddr_end1 = 13-1;\n\tmxuparam.ubuf_raddr_end2 = 13*13-1;\n\tmxuparam.psum_start = 1;\n\tmxuparam.psum_step1 = 1;\n\tmxuparam.psum_end1 = 12;\n\tmxuparam.psum_step2 = 14;\n\tMXU(ubuf,weight,psum,mxuparam);\n\n//3,2\n\tmxuparam.weight_raddr = 36*8;\n\tmxuparam.iscalc = true;\n\tmxuparam.isping = !mxuparam.isping;\n\tmxuparam.isfirstpsum = false;\n\tmxuparam.ubuf_raddr_start = 14;\n\tmxuparam.ubuf_raddr_num = 14*13;\n\tmxuparam.ubuf_raddr_step1 = 1;\n\tmxuparam.ubuf_raddr_step2 = 14;\n\tmxuparam.ubuf_raddr_step3 = 1;\n\tmxuparam.ubuf_raddr_end1 = 14-1;\n\tmxuparam.ubuf_raddr_end2 = 14*13-1;\n\tmxuparam.psum_start = 0;\n\tmxuparam.psum_step1 = 1;\n\tmxuparam.psum_end1 = 13;\n\tmxuparam.psum_step2 = 14;\n\tMXU(ubuf,weight,psum,mxuparam);\n\n//3,3\n\tmxuparam.isload = false;\n\tmxuparam.iscalc = true;\n\tmxuparam.isping = !mxuparam.isping;\n\tmxuparam.isfirstpsum = false;\n\tmxuparam.ubuf_raddr_start = 15;\n\tmxuparam.ubuf_raddr_num = 13*13;\n\tmxuparam.ubuf_raddr_step1 = 1;\n\tmxuparam.ubuf_raddr_step2 = 14;\n\tmxuparam.ubuf_raddr_step3 = 1;\n\tmxuparam.ubuf_raddr_end1 = 13-1;\n\tmxuparam.ubuf_raddr_end2 = 13*13-1;\n\tmxuparam.psum_start = 0;\n\tmxuparam.psum_step1 = 1;\n\tmxuparam.psum_end1 = 12;\n\tmxuparam.psum_step2 = 14;\n\tMXU(ubuf,weight,psum,mxuparam);\n\n\tint err = 0;\n\tfid = fopen(\"golden.dat\",\"rb\");\n\tfor(int i=0;i<14*14;i++){\n\t\tfor(int j=0;j<32;j++){\n\t\t\tint a;\n\t\t\tfread(&a,sizeof(int),1,fid);\n\t\t\tif(psum[i][j] != a)\n\t\t\t\terr++;\n\t\t}\n\t}\n\treturn 0;\n}\n"
  },
  {
    "path": "lab1/src/tpu.h",
    "content": "#include \"ap_int.h\"\n\n#define MXU_COLNUM 32\n#define MXU_ROWNUM 32\n#define WEIGHTDTYPE char\n#define FEATDTYPE char\n#define PSUMDTYPE int\n\n\nstruct MXU_PARAM{\n\tbool isload;\n\tbool iscalc;\n\tbool isping;\n\tbool isfirstpsum;\n\n\tshort weight_raddr;\n\tshort ubuf_raddr_start;\n\tshort ubuf_raddr_step1;\n\tshort ubuf_raddr_step2;\n\tshort ubuf_raddr_step3;\n\tshort ubuf_raddr_end1;\n\tshort ubuf_raddr_end2;\n\tshort ubuf_raddr_end3;\n\tshort ubuf_raddr_num;\n\tshort psum_start;\n\tshort psum_step1;\n\tshort psum_end1;\n\tshort psum_step2;\n};\nstruct ACCREL_PARAM{\n\tbool isrelu;\n\tshort psum_raddr_start;\n\tshort psum_raddr_num;\n\n\tshort ubuf_waddr_start;\n\tshort ubuf_waddr_step1;\n\tshort ubuf_waddr_step2;\n\tshort ubuf_waddr_step3;\n\tshort ubuf_waddr_end1;\n\tshort ubuf_waddr_end2;\n\tshort ubuf_waddr_end3;\n};\n\nvoid MXU(FEATDTYPE ubuf[16384][MXU_ROWNUM],WEIGHTDTYPE weight[512][MXU_COLNUM],\n\t\tPSUMDTYPE psum[512][MXU_COLNUM],MXU_PARAM mxuparam);\n"
  },
  {
    "path": "lab2/README.md",
    "content": "\n# Relu, Normalization & Pooling \n\nBasic Module of Tensor Processing Unit\n\n## 1.Env & Build  \n **env** :   \n - Vivado HLS 2018.2 or 2016.3 , MATLAB 2014a(for matlabcode)  \n \n **run** :  \n - step1: `vivado_hls -f run_hls.tcl`\n - step2: lanch vivado HLS and open the project  \n - step3: Run C synthesis, C/RTL cosimulation e.t.c\n\n## 2.Relative Link  "
  },
  {
    "path": "lab2/run_hls.tcl",
    "content": "open_project -reset relu_norm_pool_prj\nset_top relu_norm_pool\nadd_files src/tpu.h\nadd_files src/relu_norm_pool.cpp\nadd_files -tb src/tb_pool.cpp\n\nopen_solution -reset \"solution1\"\nset_part {xczu7cg-fbvb900-2-i} -tool vivado\ncreate_clock -period 2.5 -name default\n\ncsim_design\n# Do not perform any other steps\n# - The basic project will be opened in the GUI \nexit\n"
  },
  {
    "path": "lab2/src/relu_norm_pool.cpp",
    "content": "\n#include \"tpu.h\"\n\nvoid relu_norm_pool(PSUMDTYPE psum_buffer[512][MXU_COLNUM],FEATDTYPE unified_buffer[16384][MXU_ROWNUM],\n\t\tint norm_coef[MXU_COLNUM],RELPOOL_PARAM param){\n#pragma HLS INTERFACE bram port=unified_buffer\n#pragma HLS INTERFACE bram port=psum_buffer\n#pragma HLS ARRAY_PARTITION variable=norm_coef complete dim=1\n#pragma HLS ARRAY_PARTITION variable=unified_buffer complete dim=2\n#pragma HLS ARRAY_PARTITION variable=psum_buffer complete dim=2\n\n\tPSUMDTYPE psumreg[MXU_COLNUM];\n\tPSUMDTYPE psumrelu[MXU_COLNUM];\n\tPSUMDTYPE psumpool[MXU_COLNUM];\n\tFEATDTYPE res[MXU_COLNUM];\n\tFEATDTYPE relu[MXU_COLNUM];\n\tshort pool[MXU_COLNUM];\n#pragma HLS ARRAY_PARTITION variable=psumreg complete dim=1\n#pragma HLS ARRAY_PARTITION variable=psumsht complete dim=1\n#pragma HLS ARRAY_PARTITION variable=res complete dim=1\n#pragma HLS ARRAY_PARTITION variable=relu complete dim=1\n#pragma HLS ARRAY_PARTITION variable=pool complete dim=1\n\n\tchar pool_kw_cnt = 0;\n\tchar pool_kh_cnt = 0;\n\tchar pool_w_cnt = 0;\n\tchar pool_h_cnt = 0;\n\tshort ubuf_waddr_p1=0;\n\tshort ubuf_waddr_p2=0;\n\tshort ubuf_waddr_p3=0;\n\n\tfor(short i=0;i<param.pool_cnt;i++){\n#pragma HLS PIPELINE\n\n\t\tshort raddr = param.psum_raddr_start + (pool_h_cnt+pool_kh_cnt)*param.pool_h_step\n\t\t\t\t+ (pool_w_cnt+pool_kw_cnt);\n\t\tfor(int j=0;j<MXU_COLNUM;j++){\n\t\t\tpsumreg[j] = psum_buffer[raddr][j];\n\t\t\tif(psumreg[j]<0&&param.isrelu)\n\t\t\t\tpsumrelu[j] = 0;\n\t\t\telse\n\t\t\t\tpsumrelu[j] = psumreg[j];\n\t\t\tif(pool_kw_cnt==0&&pool_kh_cnt==0){\n\t\t\t\tpsumpool[j] = psumrelu[j];\n\t\t\t}\n\t\t\telse if(param.maxpool){\n\t\t\t\tif(psumrelu[j]>psumpool[j])\n\t\t\t\t\tpsumpool[j] = psumrelu[j];\n\t\t\t}\n\t\t\telse{\n\t\t\t\tpsumpool[j] = psumpool[j] + psumrelu[j];\n\t\t\t}\n\t\t}\n\n\t\tif(pool_kw_cnt==param.pool_kw&&pool_kh_cnt==param.pool_kh){\n\t\t\tshort ubuf_waddr = param.ubuf_waddr_start + ubuf_waddr_p1 + ubuf_waddr_p2 + ubuf_waddr_p3;\n\t\t\tif(ubuf_waddr_p1==param.ubuf_waddr_end1){\n\t\t\t\tif(ubuf_waddr_p2==param.ubuf_waddr_end2){\n\t\t\t\t\tubuf_waddr_p2 = 0;\n\t\t\t\t\tubuf_waddr_p3 = ubuf_waddr_p3 + param.ubuf_waddr_step3;\n\t\t\t\t}\n\t\t\t\telse{\n\t\t            ubuf_waddr_p2 = ubuf_waddr_p2 +  param.ubuf_waddr_step2;\n\t\t\t\t}\n\t\t\t}\n\t\t\telse{\n\t\t\t\tubuf_waddr_p1 = ubuf_waddr_p1 + param.ubuf_waddr_step1;\n\t\t\t}\n\t\t\tfor(int j=0;j<MXU_COLNUM;j++){\n\t\t\t\tlong tmp;\n\t\t\t\ttmp = long(psumpool[j])*long(norm_coef[j]);\n\t\t\t\tint tmpcut = tmp>>32;\n\t\t\t\tap_int<8> res;\n\t\t\t\tif(tmpcut>127)\n\t\t\t\t\tres = 127;\n\t\t\t\telse if(tmpcut<-128)\n\t\t\t\t\tres = -128;\n\t\t\t\telse\n\t\t\t\t\tres = tmpcut;\n\t\t\t\tunified_buffer[ubuf_waddr][j] = res;\n\t\t\t}\n\t\t}\n\n\t\tif(pool_kw_cnt==param.pool_kw){\n\t\t\tpool_kw_cnt = 0;\n\t\t\tif(pool_kh_cnt==param.pool_kh){\n\t\t\t\tpool_kh_cnt = 0;\n\t\t\t\tif(pool_w_cnt==param.pool_w){\n\t\t\t\t\tpool_w_cnt = 0;\n\t\t\t\t\tpool_h_cnt = pool_h_cnt + param.pool_sh;\n\t\t\t\t}\n\t\t\t\telse{\n\t\t\t\t\tpool_w_cnt = pool_w_cnt + param.pool_sw;\n\t\t\t\t}\n\t\t\t}\n\t\t\telse{\n\t\t\t\tpool_kh_cnt = pool_kh_cnt + 1;\n\t\t\t}\n\t\t}\n\t\telse{\n\t\t\tpool_kw_cnt = pool_kw_cnt + 1;\n\t\t}\n\t}\n}\n"
  },
  {
    "path": "lab2/src/tb_pool.cpp",
    "content": "#include \"tpu.h\"\n#include \"stdio.h\"\n#include \"stdlib.h\"\n\nint main(){\n\tPSUMDTYPE psum_buffer[512][MXU_COLNUM];\n\tFEATDTYPE unified_buffer[16384][MXU_ROWNUM];\n\tint norm_coef[MXU_COLNUM];\n\tRELPOOL_PARAM param;\n\tfor(int i=0;i<14;i++){\n\t\tfor(int j=0;j<14;j++){\n\t\t\tfor(int c=0;c<32;c++){\n\t\t\t\tpsum_buffer[i*14+j][c] = (i*14+j+c)*512;\n\t\t\t}\n\t\t}\n\t}\n\tfor(int c=0;c<32;c++)\n\t\tnorm_coef[c] = 1<<23;\n\n\t// no pooling\n\tparam.isrelu = true;\n\tparam.psum_raddr_start = 0;\n\tparam.maxpool = true;\n\tparam.pool_kw = 0;\n\tparam.pool_kh = 0;\n\tparam.pool_w = 14-1;\n\tparam.pool_sw = 1;\n\tparam.pool_sh = 1;\n\tparam.pool_cnt = 14*14;\n\tparam.pool_h_step = 14;\n\tparam.ubuf_waddr_start = 0;\n\tparam.ubuf_waddr_step1 = 1;\n\tparam.ubuf_waddr_end1 = 14*14-1;\n\trelu_norm_pool(psum_buffer,unified_buffer,norm_coef,param);\n\n\tFEATDTYPE golden[14*14][MXU_ROWNUM];\n\tfor(int i=0;i<14;i++){\n\t\tfor(int j=0;j<14;j++){\n\t\t\tfor (int k=0;k<32;k++){\n\t\t\t\tint tmp = psum_buffer[i*14+j][k]/512;\n\t\t\t\ttmp = tmp>127?127:tmp;\n\t\t\t\ttmp = tmp<-128?-128:tmp;\n\t\t\t\tgolden[i*14+j][k] = tmp;\n\t\t\t}\n\t\t}\n\t}\n\tint err=0;\n\tfor(int i=0;i<14*14;i++){\n\t\tfor(int k=0;k<32;k++){\n\t\t\tif(golden[i][k]!=unified_buffer[i][k])\n\t\t\t\terr ++;\n\t\t}\n\t}\n\n\t// max pooling 2,2\n\tfor(int c=0;c<32;c++)\n\t\tnorm_coef[c] = 1<<23;\n\tparam.isrelu = true;\n\tparam.psum_raddr_start = 0;\n\tparam.maxpool = true;\n\tparam.pool_kw = 1;\n\tparam.pool_kh = 1;\n\tparam.pool_w = 12;\n\tparam.pool_sw = 2;\n\tparam.pool_sh = 2;\n\tparam.pool_cnt = 14*14;\n\tparam.pool_h_step = 14;\n\tparam.ubuf_waddr_start = 0;\n\tparam.ubuf_waddr_step1 = 1;\n\tparam.ubuf_waddr_end1 = 7*7-1;\n\trelu_norm_pool(psum_buffer,unified_buffer,norm_coef,param);\n\n\tfor(int i=0;i<7;i++){\n\t\tfor(int j=0;j<7;j++){\n\t\t\tfor (int k=0;k<32;k++){\n\t\t\t\tint tmp = -128;\n\t\t\t\tfor(int i1=0;i1<2;i1++){\n\t\t\t\t\tfor(int j1=0;j1<2;j1++){\n\t\t\t\t\t\tif(tmp<psum_buffer[(2*i+i1)*14+2*j+j1][k]/512)\n\t\t\t\t\t\t\ttmp = psum_buffer[(2*i+i1)*14+2*j+j1][k]/512;\n\t\t\t\t\t}\n\t\t\t\t}\n\t\t\t\ttmp = tmp>127?127:tmp;\n\t\t\t\ttmp = tmp<-128?-128:tmp;\n\t\t\t\tgolden[i*7+j][k] = tmp;\n\t\t\t}\n\t\t}\n\t}\n\tfor(int i=0;i<7*7;i++){\n\t\tfor(int k=0;k<32;k++){\n\t\t\tif(golden[i][k]!=unified_buffer[i][k])\n\t\t\t\terr ++;\n\t\t}\n\t}\n\n\tfor(int c=0;c<32;c++)\n\t\tnorm_coef[c] = 171196;\n\t// avg pooling 7,7\n\tparam.isrelu = true;\n\tparam.psum_raddr_start = 0;\n\tparam.maxpool = false;\n\tparam.pool_kw = 6;\n\tparam.pool_kh = 6;\n\tparam.pool_w = 7;\n\tparam.pool_sw = 7;\n\tparam.pool_sh = 7;\n\tparam.pool_cnt = 14*14;\n\tparam.pool_h_step = 14;\n\tparam.ubuf_waddr_start = 0;\n\tparam.ubuf_waddr_step1 = 1;\n\tparam.ubuf_waddr_end1 = 14*14-1;\n\trelu_norm_pool(psum_buffer,unified_buffer,norm_coef,param);\n\n\tfor(int i=0;i<2;i++){\n\t\tfor(int j=0;j<2;j++){\n\t\t\tfor (int k=0;k<32;k++){\n\t\t\t\tint tmp = 0;\n\t\t\t\tfor(int i1=0;i1<7;i1++){\n\t\t\t\t\tfor(int j1=0;j1<7;j1++){\n\t\t\t\t\t\ttmp += psum_buffer[(i*7+i1)*14+7*j+j1][k];\n\t\t\t\t\t}\n\t\t\t\t}\n\t\t\t\ttmp = (long(tmp)*long(171196))>>32;\n\t\t\t\ttmp = tmp>127?127:tmp;\n\t\t\t\ttmp = tmp<-128?-128:tmp;\n\t\t\t\tgolden[i*2+j][k] = tmp;\n\t\t\t}\n\t\t}\n\t}\n\tfor(int i=0;i<2*2;i++){\n\t\tfor(int k=0;k<32;k++){\n\t\t\tif(golden[i][k]!=unified_buffer[i][k])\n\t\t\t\terr ++;\n\t\t}\n\t}\n\treturn err;\n}\n"
  },
  {
    "path": "lab2/src/tpu.h",
    "content": "#include \"ap_int.h\"\n\n#define MXU_COLNUM 32\n#define MXU_ROWNUM 32\n#define WEIGHTDTYPE char\n#define FEATDTYPE char\n#define PSUMDTYPE int\n\n\nstruct MXU_PARAM{\n\tbool isload;\n\tbool iscalc;\n\tbool isping;\n\tbool isfirstpsum;\n\n\tshort weight_raddr;\n\tshort ubuf_raddr_start;\n\tshort ubuf_raddr_step1;\n\tshort ubuf_raddr_step2;\n\tshort ubuf_raddr_step3;\n\tshort ubuf_raddr_end1;\n\tshort ubuf_raddr_end2;\n\tshort ubuf_raddr_end3;\n\tshort ubuf_raddr_num;\n\tshort psum_start;\n\tshort psum_step1;\n\tshort psum_end1;\n\tshort psum_step2;\n};\nstruct RELPOOL_PARAM{\n\tbool isrelu;\n\tshort psum_raddr_start;\n\n\tbool maxpool; // max pool or average pool\n\tchar pool_kw;\n\tchar pool_kh;\n\tchar pool_w;\n\tchar pool_sw;\n\tchar pool_sh;\n\tshort pool_cnt; // output_num*pool_kw*pool_kh\n\tshort pool_h_step;\n\n\tshort ubuf_waddr_start;\n\tshort ubuf_waddr_step1;\n\tshort ubuf_waddr_step2;\n\tshort ubuf_waddr_step3;\n\tshort ubuf_waddr_end1;\n\tshort ubuf_waddr_end2;\n\tshort ubuf_waddr_end3;\n};\n\nvoid MXU(FEATDTYPE ubuf[16384][MXU_ROWNUM],WEIGHTDTYPE weight[512][MXU_COLNUM],\n\t\tPSUMDTYPE psum[512][MXU_COLNUM],MXU_PARAM mxuparam);\nvoid relu_norm_pool(PSUMDTYPE psum_buffer[512][MXU_COLNUM],FEATDTYPE unified_buffer[16384][MXU_ROWNUM],\n\t\tint norm_coef[MXU_COLNUM],RELPOOL_PARAM param);\n"
  },
  {
    "path": "src/ctrl.cpp",
    "content": "#include \"tpu.h\"\n\nvoid loadWeight(ap_uint<256> *ddr,WEIGHTDTYPE weight_buffer[512][MXU_COLNUM],\n\t\tunsigned offset,short addr, short len, bool enable){\n\tif(!enable)\n\t\treturn;\n\tfor(int i=0;i<len;i++){\n#pragma HLS PIPELINE\n\t\tap_uint<256> tmp = ddr[offset+i];\n\t\tfor(int j=0;j<32;j++){\n\t\t\tweight_buffer[addr+i][j] = tmp(j*8+7,j*8);\n\t\t}\n\t}\n}\n\nvoid loadFeature(ap_uint<256> *ddr,FEATDTYPE unified_buffer[512][MXU_ROWNUM],\n\t\tunsigned offset,short addr, short len, bool enable){\n\tif(!enable)\n\t\treturn;\n\tfor(int i=0;i<len;i++){\n#pragma HLS PIPELINE\n\t\tap_uint<256> tmp = ddr[offset+i];\n\t\tfor(int j=0;j<32;j++){\n\t\t\tunified_buffer[addr+i][j] = tmp(j*8+7,j*8);\n\t\t}\n\t}\n}\nvoid storeFeature(ap_uint<256> *ddr,FEATDTYPE unified_buffer[512][MXU_COLNUM],\n\t\tunsigned offset,short addr, short len, bool enable){\n\tif(!enable)\n\t\treturn;\n\tfor(int i=0;i<len;i++){\n#pragma HLS PIPELINE\n\t\tap_uint<256> tmp;\n\t\tfor(int j=0;j<32;j++){\n\t\t\ttmp(j*8+7,j*8) = unified_buffer[addr+i][j];;\n\t\t}\n\t\tddr[offset+i] = tmp;\n\t}\n}\n//set instr. set register\n//run instr. run process\n//eop instr. end of process\n//\n\nvoid instr(ap_uint<64> *ddr,unsigned &offset,ap_int<16> reggroup[96],ap_int<8> &runmode,bool enable){\n#pragma HLS INTERFACE m_axi depth=8192 port=ddr\n#pragma HLS ARRAY_PARTITION variable=reggroup complete dim=1\n\tif(!enable)\n\t\treturn;\n\tbool isRunInstr = false;\n\twhile(!isRunInstr){\n\t\tap_uint<64> tmp = ddr[offset];\n\t\toffset++;\n\t\tif(tmp[63]==0){\n\t\t\tswitch(tmp(52,48)){\n\t\t\tcase( 0):reggroup[ 0] = tmp(15, 0);reggroup[ 1] = tmp(31,16);reggroup[ 2] = tmp(47,32);break;\n\t\t\tcase( 1):reggroup[ 3] = tmp(15, 0);reggroup[ 4] = tmp(31,16);reggroup[ 5] = tmp(47,32);break;\n\t\t\tcase( 2):reggroup[ 6] = tmp(15, 0);reggroup[ 7] = tmp(31,16);reggroup[ 8] = tmp(47,32);break;\n\t\t\tcase( 3):reggroup[ 9] = tmp(15, 0);reggroup[10] = tmp(31,16);reggroup[11] = tmp(47,32);break;\n\t\t\tcase( 4):reggroup[12] = tmp(15, 0);reggroup[13] = tmp(31,16);reggroup[14] = tmp(47,32);break;\n\t\t\tcase( 5):reggroup[15] = tmp(15, 0);reggroup[16] = tmp(31,16);reggroup[17] = tmp(47,32);break;\n\t\t\tcase( 6):reggroup[18] = tmp(15, 0);reggroup[19] = tmp(31,16);reggroup[20] = tmp(47,32);break;\n\t\t\tcase( 7):reggroup[21] = tmp(15, 0);reggroup[22] = tmp(31,16);reggroup[23] = tmp(47,32);break;\n\t\t\tcase( 8):reggroup[24] = tmp(15, 0);reggroup[25] = tmp(31,16);reggroup[26] = tmp(47,32);break;\n\t\t\tcase( 9):reggroup[27] = tmp(15, 0);reggroup[28] = tmp(31,16);reggroup[29] = tmp(47,32);break;\n\t\t\tcase(10):reggroup[30] = tmp(15, 0);reggroup[31] = tmp(31,16);reggroup[32] = tmp(47,32);break;\n\t\t\tcase(11):reggroup[33] = tmp(15, 0);reggroup[34] = tmp(31,16);reggroup[35] = tmp(47,32);break;\n\t\t\tcase(12):reggroup[36] = tmp(15, 0);reggroup[37] = tmp(31,16);reggroup[38] = tmp(47,32);break;\n\t\t\tcase(13):reggroup[39] = tmp(15, 0);reggroup[40] = tmp(31,16);reggroup[41] = tmp(47,32);break;\n\t\t\tcase(14):reggroup[42] = tmp(15, 0);reggroup[43] = tmp(31,16);reggroup[44] = tmp(47,32);break;\n\t\t\tcase(15):reggroup[45] = tmp(15, 0);reggroup[46] = tmp(31,16);reggroup[47] = tmp(47,32);break;\n\t\t\tcase(16):reggroup[48] = tmp(15, 0);reggroup[49] = tmp(31,16);reggroup[50] = tmp(47,32);break;\n\t\t\tcase(17):reggroup[51] = tmp(15, 0);reggroup[52] = tmp(31,16);reggroup[53] = tmp(47,32);break;\n\t\t\tcase(18):reggroup[54] = tmp(15, 0);reggroup[55] = tmp(31,16);reggroup[56] = tmp(47,32);break;\n\t\t\tcase(19):reggroup[57] = tmp(15, 0);reggroup[58] = tmp(31,16);reggroup[59] = tmp(47,32);break;\n\t\t\tcase(20):reggroup[60] = tmp(15, 0);reggroup[61] = tmp(31,16);reggroup[62] = tmp(47,32);break;\n\t\t\tcase(21):reggroup[63] = tmp(15, 0);reggroup[64] = tmp(31,16);reggroup[65] = tmp(47,32);break;\n\t\t\tcase(22):reggroup[66] = tmp(15, 0);reggroup[67] = tmp(31,16);reggroup[68] = tmp(47,32);break;\n\t\t\tcase(23):reggroup[69] = tmp(15, 0);reggroup[70] = tmp(31,16);reggroup[71] = tmp(47,32);break;\n\t\t\tcase(24):reggroup[72] = tmp(15, 0);reggroup[73] = tmp(31,16);reggroup[74] = tmp(47,32);break;\n\t\t\tcase(25):reggroup[75] = tmp(15, 0);reggroup[76] = tmp(31,16);reggroup[77] = tmp(47,32);break;\n\t\t\tcase(26):reggroup[78] = tmp(15, 0);reggroup[79] = tmp(31,16);reggroup[80] = tmp(47,32);break;\n\t\t\tcase(27):reggroup[81] = tmp(15, 0);reggroup[82] = tmp(31,16);reggroup[83] = tmp(47,32);break;\n\t\t\tcase(28):reggroup[84] = tmp(15, 0);reggroup[85] = tmp(31,16);reggroup[86] = tmp(47,32);break;\n\t\t\tcase(29):reggroup[87] = tmp(15, 0);reggroup[88] = tmp(31,16);reggroup[89] = tmp(47,32);break;\n\t\t\tcase(30):reggroup[90] = tmp(15, 0);reggroup[91] = tmp(31,16);reggroup[92] = tmp(47,32);break;\n\t\t\tcase(31):reggroup[93] = tmp(15, 0);reggroup[94] = tmp(31,16);reggroup[95] = tmp(47,32);break;\n\t\t\t}\n\t\t}\n\t\telse{\n\t\t\trunmode = tmp(55,48);\n\t\t\tisRunInstr = true;\n\t\t}\n\t}\n}\n\n\nvoid config(ap_int<16> reggroup[96],MXU_PARAM &mxuparam,RELPOOL_PARAM &poolparam,LDST_PARAM &lsdtparam, ap_int<32> norm_coef[32]){\n#pragma HLS INLINE\n    mxuparam.isload          = reggroup[ 0].range(0,0);\n\tmxuparam.iscalc          = reggroup[ 0].range(1,1);\n\tmxuparam.isping          = reggroup[ 0].range(2,2);\n\tmxuparam.isfirstpsum     = reggroup[ 0].range(3,3);\n\tmxuparam.weight_raddr    = reggroup[ 1];\n\tmxuparam.ubuf_raddr_start= reggroup[ 2];\n\tmxuparam.ubuf_raddr_step1= reggroup[ 3];\n\tmxuparam.ubuf_raddr_step2= reggroup[ 4];\n\tmxuparam.ubuf_raddr_step3= reggroup[ 5];\n\tmxuparam.ubuf_raddr_end1 = reggroup[ 6];\n\tmxuparam.ubuf_raddr_end2 = reggroup[ 7];\n\tmxuparam.ubuf_raddr_end3 = reggroup[ 8];\n\tmxuparam.ubuf_raddr_num  = reggroup[ 9];\n\tmxuparam.psum_start      = reggroup[10];\n\tmxuparam.psum_step1      = reggroup[11];\n\tmxuparam.psum_end1       = reggroup[12];\n\tmxuparam.psum_step2      = reggroup[13];\n\n\tpoolparam.isrelu    = reggroup[14].range( 0,0);\n\tpoolparam.maxpool   = reggroup[14].range( 1,1);\n\tpoolparam.avg_shift = reggroup[14].range( 7,4);\n\tpoolparam.pool_kw   = reggroup[14].range(15,8);\n\tpoolparam.pool_kh   = reggroup[15].range( 7,0);\n\tpoolparam.pool_w    = reggroup[15].range(15,8);\n\tpoolparam.pool_sw   = reggroup[16].range( 7,0);\n\tpoolparam.pool_sh   = reggroup[16].range(15,8);\n\tpoolparam.psum_raddr_start = reggroup[17];\n\tpoolparam.pool_cnt         = reggroup[18];\n\tpoolparam.pool_h_step      = reggroup[19];\n\tpoolparam.avg_val          = reggroup[20];\n\tpoolparam.ubuf_waddr_start = reggroup[21];\n\tpoolparam.ubuf_waddr_step1 = reggroup[22];\n\tpoolparam.ubuf_waddr_step2 = reggroup[23];\n\tpoolparam.ubuf_waddr_step3 = reggroup[24];\n\tpoolparam.ubuf_waddr_end1  = reggroup[25];\n\tpoolparam.ubuf_waddr_end2  = reggroup[26];\n\tpoolparam.ubuf_waddr_end3  = reggroup[27];\n\n\tlsdtparam.weight_addr = reggroup[28];\n\tlsdtparam.weight_ldlen = reggroup[29];\n\tap_uint<32> tmp = (reggroup[31],reggroup[30]);\n\tlsdtparam.weight_offset = tmp;\n\tfor(int i=0;i<32;i++){\n#pragma HLS UNROLL\n\t\tnorm_coef[i] = (reggroup[33+2*i],reggroup[32+2*i]);\n\t}\n\treturn;\n}\n"
  },
  {
    "path": "src/mxu.cpp",
    "content": "\n#include \"tpu.h\"\n\nvoid SetWeight(WEIGHTDTYPE weight[512][MXU_COLNUM],WEIGHTDTYPE weightreg[MXU_ROWNUM+4][MXU_COLNUM],\n\t\tshort weight_raddr, bool enable){\n\tif(!enable)\n\t\treturn;\n\tfor(short i=weight_raddr;i<weight_raddr+4+MXU_ROWNUM;i++){\n#pragma HLS PIPELINE\n\t\tfor(int j=0;j<MXU_ROWNUM+4;j++){\n\t\t\tfor(int k=0;k<MXU_COLNUM;k++){\n\t\t\t\tif(j!=MXU_ROWNUM+3)\n\t\t\t\t\tweightreg[j][k] = weightreg[j+1][k];\n\t\t\t\telse\n\t\t\t\t\tweightreg[j][k] = weight[i][k];\n\t\t\t}\n\t\t}\n\t}\n}\n\nvoid MacArray(FEATDTYPE ubuf[16384][MXU_ROWNUM],WEIGHTDTYPE weightreg[4+MXU_ROWNUM][MXU_COLNUM],\n\t\tPSUMDTYPE psum[512][MXU_COLNUM],MXU_PARAM mxuparam,bool enable){\n\tif(!enable)\n\t\treturn;\n\tFEATDTYPE featreg[MXU_ROWNUM][MXU_COLNUM+MXU_ROWNUM-1];\n\tPSUMDTYPE psumreg[MXU_ROWNUM][MXU_COLNUM];\n    short ubuf_raddr_p1=0;\n    short ubuf_raddr_p2=0;\n    short ubuf_raddr_p3=0;\n    short psum_addr_p1[MXU_COLNUM];\n    short psum_addr_p2[MXU_COLNUM];\n    for(int i=0;i<MXU_COLNUM;i++){\n#pragma HLS UNROLL\n    \tpsum_addr_p1[i] = 0;\n    \tpsum_addr_p2[i] = 0;\n    }\n\tfor(short i=0;i<mxuparam.ubuf_raddr_num+MXU_ROWNUM+MXU_COLNUM-2;i++){\n#pragma HLS PIPELINE\n    short ubuf_raddr = mxuparam.ubuf_raddr_start + ubuf_raddr_p1 + ubuf_raddr_p2 + ubuf_raddr_p3;\n    if(ubuf_raddr_p1==mxuparam.ubuf_raddr_end1){\n        ubuf_raddr_p1 = 0;\n        if(ubuf_raddr_p2==mxuparam.ubuf_raddr_end2){\n            ubuf_raddr_p2 = 0;\n            ubuf_raddr_p3 = ubuf_raddr_p3 +  mxuparam.ubuf_raddr_step3;\n        }\n        else{\n            ubuf_raddr_p2 = ubuf_raddr_p2 +  mxuparam.ubuf_raddr_step2;\n        }\n    }\n    else{\n        ubuf_raddr_p1 = ubuf_raddr_p1 + mxuparam.ubuf_raddr_step1;\n    }\n\n\t\tfor(int j=0;j<MXU_ROWNUM;j++){\n\t\t\tfor(int k=MXU_ROWNUM+MXU_COLNUM-2;k>=0;k--){\n\t\t\t\tif(k>0)\n\t\t\t\t\tfeatreg[j][k] = featreg[j][k-1];\n\t\t\t\telse\n\t\t\t\t\tif(i<mxuparam.ubuf_raddr_num)\n\t\t\t\t\t\tfeatreg[j][k] = ubuf[ubuf_raddr][j];\n\t\t\t\t\telse\n\t\t\t\t\t\tfeatreg[j][k] = 0;\n\t\t\t}\n\t\t}\n\n\t\tfor(int j=MXU_ROWNUM-1;j>=0;j--){\n\t\t\tfor(int k=0;k<MXU_COLNUM;k++){\n\t\t\t\tap_int<32> biasreg;\n\t\t\t\tbiasreg(31,24)=weightreg[MXU_ROWNUM+0][k];\n\t\t\t\tbiasreg(23,16)=weightreg[MXU_ROWNUM+1][k];\n\t\t\t\tbiasreg(15, 8)=weightreg[MXU_ROWNUM+2][k];\n\t\t\t\tbiasreg( 7, 0)=weightreg[MXU_ROWNUM+3][k];\n\t\t\t\tif(j==0)\n\t\t\t\t\tpsumreg[j][k] = featreg[j][k+j]*weightreg[j][k] + biasreg;\n\t\t\t\telse\n\t\t\t\t\tpsumreg[j][k] = featreg[j][k+j]*weightreg[j][k] + psumreg[j-1][k];\n\t\t\t}\n\t\t}\n#pragma HLS DEPENDENCE variable=psum inter false\n#pragma HLS DEPENDENCE variable=psum intra false\n\t\tfor(int j=0;j<MXU_COLNUM;j++){\n\t\t\tif(i>=j+MXU_ROWNUM-1&&i<mxuparam.ubuf_raddr_num+j+MXU_ROWNUM-1){\n\t\t\t\tshort psum_raddr = mxuparam.psum_start%512 + psum_addr_p1[j] + psum_addr_p2[j];\n\t\t\t\tif(psum_addr_p1[j]==mxuparam.psum_end1){\n\t\t\t\t\tpsum_addr_p1[j] = 0;\n\t\t\t\t\tpsum_addr_p2[j] = psum_addr_p2[j] + mxuparam.psum_step2;\n\t\t\t\t}\n\t\t\t\telse{\n\t\t\t\t\tpsum_addr_p1[j] = psum_addr_p1[j] + mxuparam.psum_step1;\n\t\t\t\t}\n\t\t\t\tif(mxuparam.isfirstpsum)\n\t\t\t\t\tpsum[psum_raddr][j] = psumreg[MXU_ROWNUM-1][j];\n\t\t\t\telse\n\t\t\t\t\tpsum[psum_raddr][j] = psumreg[MXU_ROWNUM-1][j] + psum[psum_raddr][j];\n\t\t\t}\n\t\t}\n\t}\n}\n\nvoid MXU(FEATDTYPE ubuf[16384][MXU_ROWNUM],WEIGHTDTYPE weight[512][MXU_COLNUM],\n\t\tPSUMDTYPE psum[512][MXU_COLNUM],MXU_PARAM mxuparam, bool enable){\n//#pragma HLS INTERFACE bram port=ubuf\n//#pragma HLS INTERFACE bram port=weight\n//#pragma HLS INTERFACE bram port=psum\n//#pragma HLS DATA_PACK variable=mxuparam\n//#pragma HLS ARRAY_PARTITION variable=ubuf complete dim=2\n//#pragma HLS ARRAY_PARTITION variable=weight complete dim=2\n//#pragma HLS ARRAY_PARTITION variable=psum complete dim=2\n\n\tstatic WEIGHTDTYPE weightreg1[4+MXU_ROWNUM][MXU_COLNUM];\n\tstatic WEIGHTDTYPE weightreg2[4+MXU_ROWNUM][MXU_COLNUM];\n#pragma HLS ARRAY_PARTITION variable=weightreg1 complete dim=0\n#pragma HLS ARRAY_PARTITION variable=weightreg2 complete dim=0\n\n\tif(!enable)\n\t\treturn;\n\tif(mxuparam.isping){\n\t\tSetWeight(weight,weightreg1,mxuparam.weight_raddr,mxuparam.isload);\n\t\tMacArray(ubuf,weightreg2,psum,mxuparam,mxuparam.iscalc);\n\t}\n\telse{\n\t\tSetWeight(weight,weightreg2,mxuparam.weight_raddr,mxuparam.isload);\n\t\tMacArray(ubuf,weightreg1,psum,mxuparam,mxuparam.iscalc);\n\t}\n}\n"
  },
  {
    "path": "src/norm_relu_pool.cpp",
    "content": "\n#include \"tpu.h\"\n\nvoid relu_norm_pool(PSUMDTYPE psum_buffer[512][MXU_COLNUM],FEATDTYPE unified_buffer[16384][MXU_ROWNUM],\n\t\tap_int<32> norm_coef[MXU_COLNUM],RELPOOL_PARAM param, bool enable){\n//#pragma HLS INTERFACE bram port=unified_buffer\n//#pragma HLS INTERFACE bram port=psum_buffer\n//#pragma HLS ARRAY_PARTITION variable=norm_coef complete dim=1\n//#pragma HLS ARRAY_PARTITION variable=unified_buffer complete dim=2\n//#pragma HLS ARRAY_PARTITION variable=psum_buffer complete dim=2\n\n\tPSUMDTYPE psumreg[MXU_COLNUM];\n\tPSUMDTYPE psumrelu[MXU_COLNUM];\n\tPSUMDTYPE psumpool[MXU_COLNUM];\n\tFEATDTYPE relu[MXU_COLNUM];\n\tshort pool[MXU_COLNUM];\n#pragma HLS ARRAY_PARTITION variable=psumreg complete dim=1\n#pragma HLS ARRAY_PARTITION variable=psumsht complete dim=1\n#pragma HLS ARRAY_PARTITION variable=relu complete dim=1\n#pragma HLS ARRAY_PARTITION variable=pool complete dim=1\n\n\tchar pool_kw_cnt = 0;\n\tchar pool_kh_cnt = 0;\n\tchar pool_w_cnt = 0;\n\tchar pool_h_cnt = 0;\n\tshort ubuf_waddr_p1=0;\n\tshort ubuf_waddr_p2=0;\n\tshort ubuf_waddr_p3=0;\n\n\tif(!enable)\n\t\treturn;\n\tfor(short i=0;i<param.pool_cnt;i++){\n#pragma HLS PIPELINE\n\n\t\tshort raddr = param.psum_raddr_start%512 + (pool_h_cnt+pool_kh_cnt)*param.pool_h_step\n\t\t\t\t+ (pool_w_cnt+pool_kw_cnt);\n\t\tfor(int j=0;j<MXU_COLNUM;j++){\n\t\t\tpsumreg[j] = psum_buffer[raddr][j];\n\t\t\tif(psumreg[j]<0&&param.isrelu)\n\t\t\t\tpsumrelu[j] = 0;\n\t\t\telse\n\t\t\t\tpsumrelu[j] = psumreg[j];\n\t\t\tif(pool_kw_cnt==0&&pool_kh_cnt==0){\n\t\t\t\tpsumpool[j] = psumrelu[j];\n\t\t\t}\n\t\t\telse if(param.maxpool){\n\t\t\t\tif(psumrelu[j]>psumpool[j])\n\t\t\t\t\tpsumpool[j] = psumrelu[j];\n\t\t\t}\n\t\t\telse{\n\t\t\t\tpsumpool[j] = psumpool[j] + psumrelu[j];\n\t\t\t}\n\t\t}\n\n\t\tif(pool_kw_cnt==param.pool_kw&&pool_kh_cnt==param.pool_kh){\n\t\t\tshort ubuf_waddr = param.ubuf_waddr_start + ubuf_waddr_p1 + ubuf_waddr_p2 + ubuf_waddr_p3;\n\t\t\tif(ubuf_waddr_p1==param.ubuf_waddr_end1){\n\t\t\t\tif(ubuf_waddr_p2==param.ubuf_waddr_end2){\n\t\t\t\t\tubuf_waddr_p2 = 0;\n\t\t\t\t\tubuf_waddr_p3 = ubuf_waddr_p3 + param.ubuf_waddr_step3;\n\t\t\t\t}\n\t\t\t\telse{\n\t\t            ubuf_waddr_p2 = ubuf_waddr_p2 +  param.ubuf_waddr_step2;\n\t\t\t\t}\n\t\t\t}\n\t\t\telse{\n\t\t\t\tubuf_waddr_p1 = ubuf_waddr_p1 + param.ubuf_waddr_step1;\n\t\t\t}\n\t\t\tfor(int j=0;j<MXU_COLNUM;j++){\n\t\t\t\tlong tmp;\n\t\t\t\ttmp = long(psumpool[j])*long(norm_coef[j]);\n\t\t\t\tint tmpcut = tmp>>32;\n\t\t\t\tap_int<8> res;\n\t\t\t\tif(tmpcut>127)\n\t\t\t\t\tres = 127;\n\t\t\t\telse if(tmpcut<-128)\n\t\t\t\t\tres = -128;\n\t\t\t\telse\n\t\t\t\t\tres = tmpcut;\n\t\t\t\tunified_buffer[ubuf_waddr][j] = res;\n\t\t\t}\n\t\t}\n\n\t\tif(pool_kw_cnt==param.pool_kw){\n\t\t\tpool_kw_cnt = 0;\n\t\t\tif(pool_kh_cnt==param.pool_kh){\n\t\t\t\tpool_kh_cnt = 0;\n\t\t\t\tif(pool_w_cnt==param.pool_w){\n\t\t\t\t\tpool_w_cnt = 0;\n\t\t\t\t\tpool_h_cnt = pool_h_cnt + param.pool_sh;\n\t\t\t\t}\n\t\t\t\telse{\n\t\t\t\t\tpool_w_cnt = pool_w_cnt + param.pool_sw;\n\t\t\t\t}\n\t\t\t}\n\t\t\telse{\n\t\t\t\tpool_kh_cnt = pool_kh_cnt + 1;\n\t\t\t}\n\t\t}\n\t\telse{\n\t\t\tpool_kw_cnt = pool_kw_cnt + 1;\n\t\t}\n\t}\n}\n"
  },
  {
    "path": "src/tb_tpu.cpp",
    "content": "#include \"tpu.h\"\n#include \"stdio.h\"\nint main(){\n\tap_uint<256> *ddr;\n\tap_uint<64> *ddr_instr;\n\tddr = (ap_uint<256> *)malloc(sizeof(ap_uint<256>)*(16384));\n\t//512*25+72*25+72+512\n\tddr_instr = (ap_uint<64> *)malloc(sizeof(ap_uint<64>)*3300);\n\tFILE *fid;\n\tfid = fopen(\"mlp_img.bin\",\"rb\");\n\tfread(ddr,32,25*512,fid);\n\tfclose(fid);\n\tfid = fopen(\"mlp_param.bin\",\"rb\");\n\tfread(ddr+512*25,32,25*72+72,fid);\n\tfclose(fid);\n\tfid = fopen(\"mlp_instr.bin\",\"rb\");\n\tap_uint<64> *ddr_instr_r = ddr_instr;\n\tint cnt = 0;\n\twhile(1==1){\n\t\tfread(ddr_instr_r,8,1,fid);\n\t\tap_uint<64> tmp = *ddr_instr_r;\n\t\tif(tmp.range(55,55)==1)\n\t\t\tbreak;\n\t\tddr_instr_r++;\n\t\tcnt++;\n\t}\n\tfclose(fid);\n\ttpu(ddr,ddr_instr);\n\tfid = fopen(\"golden_result.txt\",\"r\");\n\tint err = 0;\n\tfor(int i=0;i<512;i++){\n\t\tap_uint<256> val = ddr[512*25+72*25+72+i];\n\t\tint maxcof = -255;\n\t\tint idx = -1;\n\t\tint ref = -1;\n\t\tfor(int j=0;j<16;j++){\n\t\t\tint cof = val(j*8+7,j*8);\n\t\t\tif(cof>127)\n\t\t\t\tcof = cof-256;\n\t\t\tif(cof>maxcof){\n\t\t\t\tmaxcof = cof;\n\t\t\t\tidx = j;\n\t\t\t}\n\t\t}\n\t\tfscanf(fid,\"%d\",&ref);\n\t\tif(idx!=ref)\n\t\t\terr++;\n\t}\n\treturn err;\n}\n"
  },
  {
    "path": "src/tpu.cpp",
    "content": "\n#include \"tpu.h\"\n\nvoid ex_module(FEATDTYPE unified_buffer[16384][MXU_ROWNUM],WEIGHTDTYPE weight_buffer[512][MXU_COLNUM],\n\t\tap_int<32> norm_coef[MXU_COLNUM],MXU_PARAM mxuparam,RELPOOL_PARAM poolparam,\n\t\tbool is_MXU,bool is_relu_norm_pool){\n#pragma HLS INLINE off\n#pragma HLS DEPENDENCE variable=unified_buffer inter false\n#pragma HLS DEPENDENCE variable=unified_buffer intra false\n\tstatic PSUMDTYPE psum_buffer1[512][MXU_COLNUM];\n\tstatic PSUMDTYPE psum_buffer2[512][MXU_COLNUM];\n#pragma HLS ARRAY_PARTITION variable=psum_buffer1 complete dim=2\n#pragma HLS ARRAY_PARTITION variable=psum_buffer2 complete dim=2\n\tif((is_MXU&&mxuparam.psum_start<512) || (is_relu_norm_pool&&poolparam.psum_raddr_start>=512) )\n\t{\n\t\tMXU(unified_buffer,weight_buffer,psum_buffer1,mxuparam,is_MXU);\n\t\trelu_norm_pool(psum_buffer2,unified_buffer,norm_coef,poolparam,is_relu_norm_pool);\n\t}\n\telse{\n\t\tMXU(unified_buffer,weight_buffer,psum_buffer2,mxuparam,is_MXU);\n\t\trelu_norm_pool(psum_buffer1,unified_buffer,norm_coef,poolparam,is_relu_norm_pool);\n\t}\n\n}\n\nvoid tpu(ap_uint<256> *ddr,ap_uint<64> *ddr_instr){\n#pragma HLS INTERFACE m_axi depth=16384 port=ddr\n#pragma HLS INTERFACE m_axi depth=3300 port=ddr_instr\n\n\tstatic FEATDTYPE unified_buffer[16384][MXU_ROWNUM];\n#pragma HLS RESOURCE variable=unified_buffer core=RAM_S2P_BRAM\n\tstatic WEIGHTDTYPE weight_buffer[512][MXU_COLNUM];\n#pragma HLS RESOURCE variable=weight_buffer core=RAM_S2P_BRAM\n\tstatic ap_int<32> norm_coef[MXU_COLNUM];\n#pragma HLS ARRAY_PARTITION variable=unified_buffer complete dim=2\n#pragma HLS ARRAY_PARTITION variable=weight_buffer complete dim=2\n#pragma HLS ARRAY_PARTITION variable=norm_coef complete dim=0\n\n\tap_int<16> reggroup[96];\n#pragma HLS ARRAY_PARTITION variable=reggroup complete dim=0\n\tMXU_PARAM mxuparam;\n\tRELPOOL_PARAM poolparam;\n\tLDST_PARAM ldstparam;\n\tunsigned instr_offset = 0;\n\tbool is_load_weight;\n\tbool is_MXU;\n\tbool is_relu_norm_pool;\n\t// load img\n\tloadFeature(ddr,unified_buffer, 0,0, 512*25, true);\n\tbool eop = false;\n\tap_int<8> runmode = 0;\t//0 nop, bit[0] loadweight;bit[1] mxu; bit[2] pool; bit[7] eop;\n\tinstr(ddr_instr,instr_offset,reggroup,runmode,true);\n\twhile(runmode[7]==0)\n\t{\n#pragma HLS DEPENDENCE variable=unified_buffer inter false\n#pragma HLS DEPENDENCE variable=unified_buffer intra false\n#pragma HLS DEPENDENCE variable=weight_buffer inter false\n#pragma HLS DEPENDENCE variable=weight_buffer intra false\n\n\t\tconfig(reggroup,mxuparam,poolparam,ldstparam,norm_coef);\n\t\tis_load_weight = runmode[0]==1;\n\t\tis_MXU = runmode[1]==1;\n\t\tis_relu_norm_pool = runmode[2]==1;\n\t\tinstr(ddr_instr,instr_offset,reggroup,runmode,true);\n\t\tloadWeight(ddr,weight_buffer,ldstparam.weight_offset,ldstparam.weight_addr,\n\t\t\t\t\tldstparam.weight_ldlen,is_load_weight);\n\t\tex_module(unified_buffer,weight_buffer,norm_coef,mxuparam,poolparam,is_MXU,is_relu_norm_pool);\n\t}\n\n\tstoreFeature(ddr,unified_buffer, 512*25+72*25+72,14000, 512, true);\n}\n"
  },
  {
    "path": "src/tpu.h",
    "content": "#include \"ap_int.h\"\n\n#define MXU_COLNUM 32\n#define MXU_ROWNUM 32\n#define WEIGHTDTYPE char\n#define FEATDTYPE char\n#define PSUMDTYPE ap_int<32>\n\n\nstruct MXU_PARAM{\n\tbool isload;\n\tbool iscalc;\n\tbool isping;\n\tbool isfirstpsum;\n\n\tshort weight_raddr;\n\tshort ubuf_raddr_start;\n\tshort ubuf_raddr_step1;\n\tshort ubuf_raddr_step2;\n\tshort ubuf_raddr_step3;\n\tshort ubuf_raddr_end1;\n\tshort ubuf_raddr_end2;\n\tshort ubuf_raddr_end3;\n\tshort ubuf_raddr_num;\n\tshort psum_start;\n\tshort psum_step1;\n\tshort psum_end1;\n\tshort psum_step2;\n};\nstruct RELPOOL_PARAM{\n\tbool isrelu;\n\tshort psum_raddr_start;\n\n\tbool maxpool; // max pool or average pool\n\tchar pool_kw;\n\tchar pool_kh;\n\tchar pool_w;\n\tchar pool_sw;\n\tchar pool_sh;\n\tshort pool_cnt; // output_num*pool_kw*pool_kh\n\tshort pool_h_step;\n\n\tshort avg_val;\n\tap_uint<4> avg_shift;\n\n\tshort ubuf_waddr_start;\n\tshort ubuf_waddr_step1;\n\tshort ubuf_waddr_step2;\n\tshort ubuf_waddr_step3;\n\tshort ubuf_waddr_end1;\n\tshort ubuf_waddr_end2;\n\tshort ubuf_waddr_end3;\n};\n\nstruct LDST_PARAM{\n\tunsigned weight_offset;\n\tshort weight_addr;\n\tshort weight_ldlen;\n};\n\nvoid MXU(FEATDTYPE ubuf[16384][MXU_ROWNUM],WEIGHTDTYPE weight[512][MXU_COLNUM],\n\t\tPSUMDTYPE psum[512][MXU_COLNUM],MXU_PARAM mxuparam, bool enable);\nvoid relu_norm_pool(PSUMDTYPE psum_buffer[512][MXU_COLNUM],FEATDTYPE unified_buffer[16384][MXU_ROWNUM],\n\t\tap_int<32> norm_coef[MXU_COLNUM],RELPOOL_PARAM param, bool enable);\nvoid loadWeight(ap_uint<256> *ddr,WEIGHTDTYPE weight_buffer[512][MXU_COLNUM],\n\t\tunsigned offset,short addr, short len, bool enable);\nvoid loadFeature(ap_uint<256> *ddr,FEATDTYPE unified_buffer[512][MXU_ROWNUM],\n\t\tunsigned offset,short addr, short len, bool enable);\nvoid storeFeature(ap_uint<256> *ddr,FEATDTYPE unified_buffer[512][MXU_COLNUM],\n\t\tunsigned offset,short addr, short len, bool enable);\nvoid instr(ap_uint<64> *ddr,unsigned &offset,ap_int<16> reggroup[96],ap_int<8> &runmode,bool enable);\nvoid config(ap_int<16> reggroup[96],MXU_PARAM &mxuparam,RELPOOL_PARAM &poolparam,\n\t\tLDST_PARAM &lsdtparam, ap_int<32> norm_coef[32]);\nvoid tpu(ap_uint<256> *ddr,ap_uint<64> *ddr_instr);\n"
  }
]